- Dec 09, 2021
-
-
Stefan Hajnoczi authored
When the request free list is exhausted the coroutine waits on q->free_req_queue for the next free request. Whenever a request is completed a BH is scheduled to invoke nvme_free_req_queue_cb() and wake up waiting coroutines. 1. nvme_get_free_req() waits for a free request: while (q->free_req_head == -1) { ... trace_nvme_free_req_queue_wait(q->s, q->index); qemu_co_queue_wait(&q->free_req_queue, &q->lock); ... } 2. nvme_free_req_queue_cb() wakes up the coroutine: while (qemu_co_enter_next(&q->free_req_queue, &q->lock)) { ^--- infinite loop when free_req_head == -1 } nvme_free_req_queue_cb() and the coroutine form an infinite loop when q->free_req_head == -1. Fix this by checking q->free_req_head in nvme_free_req_queue_cb(). If the free request list is exhausted, don't wake waiting coroutines. Eventually an in-flight request will complete and the BH will be scheduled again, guaranteeing forward progress. Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20211208152246.244585-1-stefanha@redhat.com Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com>
-
- Nov 02, 2021
-
-
Philippe Mathieu-Daudé authored
Instead of duplicating code, extract the common helper to free a single queue. Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20211006164931.172349-4-philmd@redhat.com> Reviewed-by:
Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
Philippe Mathieu-Daudé authored
For debugging purpose it is helpful to know the CQ/SQ pointers. We already have a trace event in nvme_free_queue_pair(), extend it to report these pointer addresses. Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20211006164931.172349-3-philmd@redhat.com> Reviewed-by:
Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
Philippe Mathieu-Daudé authored
Since commit 4d324c0b ("introduce QEMU_AUTO_VFREE") buffers allocated by qemu_memalign() can automatically freed when using the QEMU_AUTO_VFREE macro. Use it to simplify a bit. Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20211006164931.172349-2-philmd@redhat.com> Reviewed-by:
Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
- Sep 29, 2021
-
-
Vladimir Sementsov-Ogievskiy authored
We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver discard handlers bytes parameter to int64_t. The only caller of all updated function is bdrv_co_pdiscard in block/io.c. It is already prepared to work with 64bit requests, but pass at most max(bs->bl.max_pdiscard, INT_MAX) to the driver. Let's look at all updated functions: blkdebug: all calculations are still OK, thanks to bdrv_check_qiov_request(). both rule_check and bdrv_co_pdiscard are 64bit blklogwrites: pass to blk_loc_writes_co_log which is 64bit blkreplay, copy-on-read, filter-compress: pass to bdrv_co_pdiscard, OK copy-before-write: pass to bdrv_co_pdiscard which is 64bit and to cbw_do_copy_before_write which is 64bit file-posix: one handler calls raw_account_discard() is 64bit and both handlers calls raw_do_pdiscard(). Update raw_do_pdiscard, which pass to RawPosixAIOData::aio_nbytes, which is 64bit (and calls raw_account_discard()) gluster: somehow, third argument of glfs_discard_async is size_t. Let's set max_pdiscard accordingly. iscsi: iscsi_allocmap_set_invalid is 64bit, !is_byte_request_lun_aligned is 64bit. list.num is uint32_t. Let's clarify max_pdiscard and pdiscard_alignment. mirror_top: pass to bdrv_mirror_top_do_write() which is 64bit nbd: protocol limitation. max_pdiscard is alredy set strict enough, keep it as is for now. nvme: buf.nlb is uint32_t and we do shift. So, add corresponding limits to nvme_refresh_limits(). preallocate: pass to bdrv_co_pdiscard() which is 64bit. rbd: pass to qemu_rbd_start_co() which is 64bit. qcow2: calculations are still OK, thanks to bdrv_check_qiov_request(), qcow2_cluster_discard() is 64bit. raw-format: raw_adjust_offset() is 64bit, bdrv_co_pdiscard too. throttle: pass to bdrv_co_pdiscard() which is 64bit and to throttle_group_co_io_limits_intercept() which is 64bit as well. test-block-iothread: bytes argument is unused Great! Now all drivers are prepared to handle 64bit discard requests, or else have explicit max_pdiscard limits. Signed-off-by:
Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-11-vsementsov@virtuozzo.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Signed-off-by:
Eric Blake <eblake@redhat.com>
-
Vladimir Sementsov-Ogievskiy authored
We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver write_zeroes handlers bytes parameter to int64_t. The only caller of all updated function is bdrv_co_do_pwrite_zeroes(). bdrv_co_do_pwrite_zeroes() itself is of course OK with widening of callee parameter type. Also, bdrv_co_do_pwrite_zeroes()'s max_write_zeroes is limited to INT_MAX. So, updated functions all are safe, they will not get "bytes" larger than before. Still, let's look through all updated functions, and add assertions to the ones which are actually unprepared to values larger than INT_MAX. For these drivers also set explicit max_pwrite_zeroes limit. Let's go: blkdebug: calculations can't overflow, thanks to bdrv_check_qiov_request() in generic layer. rule_check() and bdrv_co_pwrite_zeroes() both have 64bit argument. blklogwrites: pass to blk_log_writes_co_log() with 64bit argument. blkreplay, copy-on-read, filter-compress: pass to bdrv_co_pwrite_zeroes() which is OK copy-before-write: Calls cbw_do_copy_before_write() and bdrv_co_pwrite_zeroes, both have 64bit argument. file-posix: both handler calls raw_do_pwrite_zeroes, which is updated. In raw_do_pwrite_zeroes() calculations are OK due to bdrv_check_qiov_request(), bytes go to RawPosixAIOData::aio_nbytes which is uint64_t. Check also where that uint64_t gets handed: handle_aiocb_write_zeroes_block() passes a uint64_t[2] to ioctl(BLKZEROOUT), handle_aiocb_write_zeroes() calls do_fallocate() which takes off_t (and we compile to always have 64-bit off_t), as does handle_aiocb_write_zeroes_unmap. All look safe. gluster: bytes go to GlusterAIOCB::size which is int64_t and to glfs_zerofill_async works with off_t. iscsi: Aha, here we deal with iscsi_writesame16_task() that has uint32_t num_blocks argument and iscsi_writesame16_task() has uint16_t argument. Make comments, add assertions and clarify max_pwrite_zeroes calculation. iscsi_allocmap_() functions already has int64_t argument is_byte_request_lun_aligned is simple to update, do it. mirror_top: pass to bdrv_mirror_top_do_write which has uint64_t argument nbd: Aha, here we have protocol limitation, and NBDRequest::len is uint32_t. max_pwrite_zeroes is cleanly set to 32bit value, so we are OK for now. nvme: Again, protocol limitation. And no inherent limit for write-zeroes at all. But from code that calculates cdw12 it's obvious that we do have limit and alignment. Let's clarify it. Also, obviously the code is not prepared to handle bytes=0. Let's handle this case too. trace events already 64bit preallocate: pass to handle_write() and bdrv_co_pwrite_zeroes(), both 64bit. rbd: pass to qemu_rbd_start_co() which is 64bit. qcow2: offset + bytes and alignment still works good (thanks to bdrv_check_qiov_request()), so tail calculation is OK qcow2_subcluster_zeroize() has 64bit argument, should be OK trace events updated qed: qed_co_request wants int nb_sectors. Also in code we have size_t used for request length which may be 32bit. So, let's just keep INT_MAX as a limit (aligning it down to pwrite_zeroes_alignment) and don't care. raw-format: Is OK. raw_adjust_offset and bdrv_co_pwrite_zeroes are both 64bit. throttle: Both throttle_group_co_io_limits_intercept() and bdrv_co_pwrite_zeroes() are 64bit. vmdk: pass to vmdk_pwritev which is 64bit quorum: pass to quorum_co_pwritev() which is 64bit Hooray! At this point all block drivers are prepared to support 64bit write-zero requests, or have explicitly set max_pwrite_zeroes. Signed-off-by:
Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-8-vsementsov@virtuozzo.com> Reviewed-by:
Eric Blake <eblake@redhat.com> [eblake: use <= rather than < in assertions relying on max_pwrite_zeroes] Signed-off-by:
Eric Blake <eblake@redhat.com>
-
Vladimir Sementsov-Ogievskiy authored
We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver write handlers parameters which are already 64bit to signed type. While being here, convert also flags parameter to be BdrvRequestFlags. Now let's consider all callers. Simple git grep '\->bdrv_\(aio\|co\)_pwritev\(_part\)\?' shows that's there three callers of driver function: bdrv_driver_pwritev() and bdrv_driver_pwritev_compressed() in block/io.c, both pass int64_t, checked by bdrv_check_qiov_request() to be non-negative. qcow2_save_vmstate() does bdrv_check_qiov_request(). Still, the functions may be called directly, not only by drv->... Let's check: git grep '\.bdrv_\(aio\|co\)_pwritev\(_part\)\?\s*=' | \ awk '{print $4}' | sed 's/,//' | sed 's/&//' | sort | uniq | \ while read func; do git grep "$func(" | \ grep -v "$func(BlockDriverState"; done shows several callers: qcow2: qcow2_co_truncate() write at most up to @offset, which is checked in generic qcow2_co_truncate() by bdrv_check_request(). qcow2_co_pwritev_compressed_task() pass the request (or part of the request) that already went through normal write path, so it should be OK qcow: qcow_co_pwritev_compressed() pass int64_t, it's updated by this patch quorum: quorum_co_pwrite_zeroes() pass int64_t and int - OK throttle: throttle_co_pwritev_compressed() pass int64_t, it's updated by this patch vmdk: vmdk_co_pwritev_compressed() pass int64_t, it's updated by this patch Signed-off-by:
Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-5-vsementsov@virtuozzo.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Signed-off-by:
Eric Blake <eblake@redhat.com>
-
Vladimir Sementsov-Ogievskiy authored
We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver read handlers parameters which are already 64bit to signed type. While being here, convert also flags parameter to be BdrvRequestFlags. Now let's consider all callers. Simple git grep '\->bdrv_\(aio\|co\)_preadv\(_part\)\?' shows that's there three callers of driver function: bdrv_driver_preadv() in block/io.c, passes int64_t, checked by bdrv_check_qiov_request() to be non-negative. qcow2_load_vmstate() does bdrv_check_qiov_request(). do_perform_cow_read() has uint64_t argument. And a lot of things in qcow2 driver are uint64_t, so converting it is big job. But we must not work with requests that don't satisfy bdrv_check_qiov_request(), so let's just assert it here. Still, the functions may be called directly, not only by drv->... Let's check: git grep '\.bdrv_\(aio\|co\)_preadv\(_part\)\?\s*=' | \ awk '{print $4}' | sed 's/,//' | sed 's/&//' | sort | uniq | \ while read func; do git grep "$func(" | \ grep -v "$func(BlockDriverState"; done The only one such caller: QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, &data, 1); ... ret = bdrv_replace_test_co_preadv(bs, 0, 1, &qiov, 0); in tests/unit/test-bdrv-drain.c, and it's OK obviously. Signed-off-by:
Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-4-vsementsov@virtuozzo.com> Reviewed-by:
Eric Blake <eblake@redhat.com> [eblake: fix typos] Signed-off-by:
Eric Blake <eblake@redhat.com>
-
- Sep 07, 2021
-
-
Philippe Mathieu-Daudé authored
We expect the first qemu_vfio_dma_map() to fail (indicating DMA mappings exhaustion, see commit 15a730e7). Do not report the first failure as error, since we are going to flush the mappings and retry. This removes spurious error message displayed on the monitor: (qemu) c (qemu) qemu-kvm: VFIO_MAP_DMA failed: No space left on device (qemu) info status VM status: running Reported-by:
Tingting Mao <timao@redhat.com> Reviewed-by:
Klaus Jensen <k.jensen@samsung.com> Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20210902070025.197072-12-philmd@redhat.com Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com>
-
Philippe Mathieu-Daudé authored
Currently qemu_vfio_dma_map() displays errors on stderr. When using management interface, this information is simply lost. Pass qemu_vfio_dma_map() an Error** handle so it can propagate the error to callers. Reviewed-by:
Fam Zheng <fam@euphon.net> Reviewed-by:
Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by:
Klaus Jensen <k.jensen@samsung.com> Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20210902070025.197072-7-philmd@redhat.com Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com>
-
Philippe Mathieu-Daudé authored
nvme_create_queue_pair() does not return a boolean value (indicating eventual error) but a pointer, and is inconsistent in how it fills the error handler. To fulfill callers expectations, always set an error message on failure. Reported-by:
Auger Eric <eric.auger@redhat.com> Reviewed-by:
Klaus Jensen <k.jensen@samsung.com> Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20210902070025.197072-6-philmd@redhat.com Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com>
-
- Jul 26, 2021
-
-
Philippe Mathieu-Daudé authored
When the NVMe block driver was introduced (see commit bdd6a90a, January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning -ENOMEM in case of error. The driver was correctly handling the error path to recycle its volatile IOVA mappings. To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit DMA mappings per container", April 2019) added the -ENOSPC error to signal the user exhausted the DMA mappings available for a container. The block driver started to mis-behave: qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device (qemu) (qemu) info status VM status: paused (io-error) (qemu) c VFIO_MAP_DMA failed: No space left on device (qemu) c VFIO_MAP_DMA failed: No space left on device (The VM is not resumable from here, hence stuck.) Fix by handling the new -ENOSPC error (when DMA mappings are exhausted) without any distinction to the current -ENOMEM error, so we don't change the behavior on old kernels where the CVE-2019-3882 fix is not present. An easy way to reproduce this bug is to restrict the DMA mapping limit (65535 by default) when loading the VFIO IOMMU module: # modprobe vfio_iommu_type1 dma_entry_limit=666 Cc: qemu-stable@nongnu.org Cc: Fam Zheng <fam@euphon.net> Cc: Maxim Levitsky <mlevitsk@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Reported-by:
Michal Prívozník <mprivozn@redhat.com> Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20210723195843.1032825-1-philmd@redhat.com Fixes: bdd6a90a ("block: Add VFIO based NVMe driver") Buglink: https://bugs.launchpad.net/qemu/+bug/1863333 Resolves: https://gitlab.com/qemu-project/qemu/-/issues/65 Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com>
-
- Feb 02, 2021
-
-
Philippe Mathieu-Daudé authored
NVMe controllers implement different versions of the spec, and different features of it. It is useful to gather this information when debugging. Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210127212137.3482291-3-philmd@redhat.com> Reviewed-by:
Klaus Jensen <k.jensen@samsung.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
Philippe Mathieu-Daudé authored
Commit 15b2260b ("block/nvme: Trace controller capabilities") misunderstood the doorbell stride value from the datasheet, use the correct one. The 'doorbell_scale' variable used few lines later is correct. Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210127212137.3482291-2-philmd@redhat.com> Reviewed-by:
Klaus Jensen <k.jensen@samsung.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
- Dec 18, 2020
-
-
Philippe Mathieu-Daudé authored
NVMe drive cannot be shrunk. Since commit c80d8b06 we can use the @exact parameter (set to false) to return success if the block device is larger than the requested offset (even if we can not be shrunk). Use this parameter to implement the NVMe truncate() coroutine, similarly how it is done for the iscsi and file-posix drivers (see commit 82325ae5 "Evaluate @exact in protocol drivers"). Reported-by:
Xueqiang Wei <xuwei@redhat.com> Suggested-by:
Max Reitz <mreitz@redhat.com> Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20201210125202.858656-1-philmd@redhat.com> Signed-off-by:
Max Reitz <mreitz@redhat.com>
-
- Nov 03, 2020
-
-
Philippe Mathieu-Daudé authored
The Completion Queue Command Identifier is a 16-bit value, so nvme_submit_command() is unlikely to work on big-endian hosts, as the relevant bits are truncated. Fix by using the correct byte-swap function. Fixes: bdd6a90a ("block: Add VFIO based NVMe driver") Reported-by:
Keith Busch <kbusch@kernel.org> Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by:
Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20201029093306.1063879-25-philmd@redhat.com Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com>
-
Philippe Mathieu-Daudé authored
qemu_vfio_pci_map_bar() calls mmap(), and mmap(2) states: 'offset' must be a multiple of the page size as returned by sysconf(_SC_PAGE_SIZE). In commit f6845323 we started to use an offset of 4K which broke this contract on Aarch64 arch. Fix by mapping at offset 0, and and accessing doorbells at offset=4K. Fixes: f6845323 ("block/nvme: Map doorbells pages write-only") Reported-by:
Eric Auger <eric.auger@redhat.com> Reviewed-by:
Eric Auger <eric.auger@redhat.com> Reviewed-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com> Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-24-philmd@redhat.com Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com>
-
Eric Auger authored
Make sure iov's va and size are properly aligned on the host page size. Signed-off-by:
Eric Auger <eric.auger@redhat.com> Reviewed-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com> Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-23-philmd@redhat.com Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com>
-
Eric Auger authored
In preparation of 64kB host page support, let's change the size and alignment of the prp_list_pages so that the VFIO DMA MAP succeeds with 64kB host page size. We align on the host page size. Reviewed-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by:
Eric Auger <eric.auger@redhat.com> Reviewed-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com> Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-22-philmd@redhat.com Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com>
-
Eric Auger authored
In preparation of 64kB host page support, let's change the size and alignment of the queue so that the VFIO DMA MAP succeeds. We align on the host page size. Signed-off-by:
Eric Auger <eric.auger@redhat.com> Reviewed-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com> Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-21-philmd@redhat.com Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com>
-
Eric Auger authored
In preparation of 64kB host page support, let's change the size and alignment of the IDENTIFY command response buffer so that the VFIO DMA MAP succeeds. We align on the host page size. Signed-off-by:
Eric Auger <eric.auger@redhat.com> Reviewed-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com> Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-20-philmd@redhat.com Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com>
-
Philippe Mathieu-Daudé authored
While trying to simplify the code using a macro, we forgot the 12-bit shift... Correct that. Fixes: fad1eb68 ("block/nvme: Use register definitions from 'block/nvme.h'") Reported-by:
Eric Auger <eric.auger@redhat.com> Reviewed-by:
Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by:
Eric Auger <eric.auger@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com> Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-19-philmd@redhat.com Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com>
-
Philippe Mathieu-Daudé authored
Commit bdd6a90a ("block: Add VFIO based NVMe driver") sets the request_alignment in nvme_refresh_limits(). For consistency, also set it during initialization. Reported-by:
Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by:
Eric Auger <eric.auger@redhat.com> Reviewed-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com> Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-18-philmd@redhat.com Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com>
-
Philippe Mathieu-Daudé authored
As all commands use the ADMIN queue, it is pointless to pass it as argument each time. Remove the argument, and rename the function as nvme_admin_cmd_sync() to make this new behavior clearer. Reviewed-by:
Eric Auger <eric.auger@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com> Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by:
Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20201029093306.1063879-17-philmd@redhat.com Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com>
-
Philippe Mathieu-Daudé authored
We don't need to dereference from BDRVNVMeState each time. Use a NVMeQueuePair pointer on the admin queue. The nvme_init() becomes easier to review, matching the style of nvme_add_io_queue(). Reviewed-by:
Eric Auger <eric.auger@redhat.com> Reviewed-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com> Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-16-philmd@redhat.com Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com>
-
Philippe Mathieu-Daudé authored
From the specification chapter 3.1.8 "AQA - Admin Queue Attributes" the Admin Submission Queue Size field is a 0’s based value: Admin Submission Queue Size (ASQS): Defines the size of the Admin Submission Queue in entries. Enabling a controller while this field is cleared to 00h produces undefined results. The minimum size of the Admin Submission Queue is two entries. The maximum size of the Admin Submission Queue is 4096 entries. This is a 0’s based value. This bug has never been hit because the device initialization uses a single command synchronously :) Reviewed-by:
Eric Auger <eric.auger@redhat.com> Reviewed-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com> Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-15-philmd@redhat.com Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com>
-
Philippe Mathieu-Daudé authored
Replace magic values by definitions, and simplifiy since the number of queues will never reach 64K. Reviewed-by:
Eric Auger <eric.auger@redhat.com> Reviewed-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com> Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-14-philmd@redhat.com Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com>
-
Philippe Mathieu-Daudé authored
Just for consistency, following the example documented since commit e3fe3988 ("error: Document Error API usage rules"), return a boolean value indicating an error is set or not. Directly pass errp as the local_err is not requested in our case. This simplifies a bit nvme_create_queue_pair(). Reviewed-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com> Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-12-philmd@redhat.com Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com>
-
Philippe Mathieu-Daudé authored
Just for consistency, following the example documented since commit e3fe3988 ("error: Document Error API usage rules"), return a boolean value indicating an error is set or not. Directly pass errp as the local_err is not requested in our case. Tested-by:
Eric Auger <eric.auger@redhat.com> Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by:
Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20201029093306.1063879-11-philmd@redhat.com Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com>
-
Philippe Mathieu-Daudé authored
We can not have negative queue count/size/index, use unsigned type. Rename 'nr_queues' as 'queue_count' to match the spec naming. Reviewed-by:
Eric Auger <eric.auger@redhat.com> Reviewed-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com> Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-10-philmd@redhat.com Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com>
-
Philippe Mathieu-Daudé authored
To be able to use some definitions in structure declarations, move them earlier. No logical change. Reviewed-by:
Eric Auger <eric.auger@redhat.com> Reviewed-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com> Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-9-philmd@redhat.com Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com>
-
Philippe Mathieu-Daudé authored
Reviewed-by:
Eric Auger <eric.auger@redhat.com> Reviewed-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com> Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-8-philmd@redhat.com Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com>
-
Philippe Mathieu-Daudé authored
What we want to trace is the block driver state and the queue index. Suggested-by:
Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by:
Eric Auger <eric.auger@redhat.com> Reviewed-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com> Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-7-philmd@redhat.com Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com>
-
Philippe Mathieu-Daudé authored
As we want to enable multiple queues, report the event in each nvme_poll_queue() call, rather than once in the callback calling nvme_poll_queues(). Reviewed-by:
Eric Auger <eric.auger@redhat.com> Reviewed-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com> Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-6-philmd@redhat.com Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com>
-
Philippe Mathieu-Daudé authored
Controllers have different capabilities and report them in the CAP register. We are particularly interested by the page size limits. Reviewed-by:
Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by:
Eric Auger <eric.auger@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com> Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-5-philmd@redhat.com Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com>
-
Philippe Mathieu-Daudé authored
Instead of displaying warning on stderr, use warn_report() which also displays it on the monitor. Reviewed-by:
Eric Auger <eric.auger@redhat.com> Reviewed-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com> Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-4-philmd@redhat.com Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-by:
Eric Auger <eric.auger@redhat.com>
-
- Oct 23, 2020
-
-
Philippe Mathieu-Daudé authored
Keep statistics of some hardware errors, and number of aligned/unaligned I/O accesses. QMP example booting a full RHEL 8.3 aarch64 guest: { "execute": "query-blockstats" } { "return": [ { "device": "", "node-name": "drive0", "stats": { "flush_total_time_ns": 6026948, "wr_highest_offset": 3383991230464, "wr_total_time_ns": 807450995, "failed_wr_operations": 0, "failed_rd_operations": 0, "wr_merged": 3, "wr_bytes": 50133504, "failed_unmap_operations": 0, "failed_flush_operations": 0, "account_invalid": false, "rd_total_time_ns": 1846979900, "flush_operations": 130, "wr_operations": 659, "rd_merged": 1192, "rd_bytes": 218244096, "account_failed": false, "idle_time_ns": 2678641497, "rd_operations": 7406, }, "driver-specific": { "driver": "nvme", "completion-errors": 0, "unaligned-accesses": 2959, "aligned-accesses": 4477 }, "qdev": "/machine/peripheral-anon/device[0]/virtio-backend" } ] } Suggested-by:
Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by:
Markus Armbruster <armbru@redhat.com> Message-id: 20201001162939.1567915-1-philmd@redhat.com Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com>
-
- Oct 05, 2020
-
-
Philippe Mathieu-Daudé authored
Use self-explicit SCALE_MS definition instead of magic value (missed in similar commit e4f310fe). Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200922083821.578519-7-philmd@redhat.com>
-
Philippe Mathieu-Daudé authored
Use the NVMe register definitions from "block/nvme.h" which ease a bit reviewing the code while matching the datasheet. Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200922083821.578519-6-philmd@redhat.com>
-
Philippe Mathieu-Daudé authored
NVMeRegs only contains NvmeBar. Simplify the code by using NvmeBar directly. This triggers a checkpatch.pl error: ERROR: Use of volatile is usually wrong, please add a comment #30: FILE: block/nvme.c:691: + volatile NvmeBar *regs; This is a false positive as in our case we are using I/O registers, so the 'volatile' use is justified. Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200922083821.578519-5-philmd@redhat.com>
-