Skip to content
  • Hanna Czenczek's avatar
    18743311
    block: Collapse padded I/O vecs exceeding IOV_MAX · 18743311
    Hanna Czenczek authored
    When processing vectored guest requests that are not aligned to the
    storage request alignment, we pad them by adding head and/or tail
    buffers for a read-modify-write cycle.
    
    The guest can submit I/O vectors up to IOV_MAX (1024) in length, but
    with this padding, the vector can exceed that limit.  As of
    4c002cef ("util/iov: make
    qemu_iovec_init_extended() honest"), we refuse to pad vectors beyond the
    limit, instead returning an error to the guest.
    
    To the guest, this appears as a random I/O error.  We should not return
    an I/O error to the guest when it issued a perfectly valid request.
    
    Before 4c002cef, we just made the vector
    longer than IOV_MAX, which generally seems to work (because the guest
    assumes a smaller alignment than we really have, file-posix's
    raw_co_prw() will generally see bdrv_qiov_is_aligned() return false, and
    so emulate the request, so that the IOV_MAX does not matter).  However,
    that does not seem exactly great.
    
    I see two ways to fix this problem:
    1. We split such long requests into two requests.
    2. We join some elements of the vector into new buffers to make it
       shorter.
    
    I am wary of (1), because it seems like it may have unintended side
    effects.
    
    (2) on the other hand seems relatively simple to implement, with
    hopefully few side effects, so this patch does that.
    
    To do this, the use of qemu_iovec_init_extended() in bdrv_pad_request()
    is effectively replaced by the new function bdrv_create_padded_qiov(),
    which not only wraps the request IOV with padding head/tail, but also
    ensures that the resulting vector will not have more than IOV_MAX
    elements.  Putting that functionality into qemu_iovec_init_extended() is
    infeasible because it requires allocating a bounce buffer; doing so
    would require many more parameters (buffer alignment, how to initialize
    the buffer, and out parameters like the buffer, its length, and the
    original elements), which is not reasonable.
    
    Conversely, it is not difficult to move qemu_iovec_init_extended()'s
    functionality into bdrv_create_padded_qiov() by using public
    qemu_iovec_* functions, so that is what this patch does.
    
    Because bdrv_pad_request() was the only "serious" user of
    qemu_iovec_init_extended(), the next patch will remove the latter
    function, so the functionality is not implemented twice.
    
    Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2141964
    
    
    Signed-off-by: default avatarHanna Czenczek <hreitz@redhat.com>
    Message-Id: <20230411173418.19549-3-hreitz@redhat.com>
    Reviewed-by: default avatarVladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
    18743311
    block: Collapse padded I/O vecs exceeding IOV_MAX
    Hanna Czenczek authored
    When processing vectored guest requests that are not aligned to the
    storage request alignment, we pad them by adding head and/or tail
    buffers for a read-modify-write cycle.
    
    The guest can submit I/O vectors up to IOV_MAX (1024) in length, but
    with this padding, the vector can exceed that limit.  As of
    4c002cef ("util/iov: make
    qemu_iovec_init_extended() honest"), we refuse to pad vectors beyond the
    limit, instead returning an error to the guest.
    
    To the guest, this appears as a random I/O error.  We should not return
    an I/O error to the guest when it issued a perfectly valid request.
    
    Before 4c002cef, we just made the vector
    longer than IOV_MAX, which generally seems to work (because the guest
    assumes a smaller alignment than we really have, file-posix's
    raw_co_prw() will generally see bdrv_qiov_is_aligned() return false, and
    so emulate the request, so that the IOV_MAX does not matter).  However,
    that does not seem exactly great.
    
    I see two ways to fix this problem:
    1. We split such long requests into two requests.
    2. We join some elements of the vector into new buffers to make it
       shorter.
    
    I am wary of (1), because it seems like it may have unintended side
    effects.
    
    (2) on the other hand seems relatively simple to implement, with
    hopefully few side effects, so this patch does that.
    
    To do this, the use of qemu_iovec_init_extended() in bdrv_pad_request()
    is effectively replaced by the new function bdrv_create_padded_qiov(),
    which not only wraps the request IOV with padding head/tail, but also
    ensures that the resulting vector will not have more than IOV_MAX
    elements.  Putting that functionality into qemu_iovec_init_extended() is
    infeasible because it requires allocating a bounce buffer; doing so
    would require many more parameters (buffer alignment, how to initialize
    the buffer, and out parameters like the buffer, its length, and the
    original elements), which is not reasonable.
    
    Conversely, it is not difficult to move qemu_iovec_init_extended()'s
    functionality into bdrv_create_padded_qiov() by using public
    qemu_iovec_* functions, so that is what this patch does.
    
    Because bdrv_pad_request() was the only "serious" user of
    qemu_iovec_init_extended(), the next patch will remove the latter
    function, so the functionality is not implemented twice.
    
    Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2141964
    
    
    Signed-off-by: default avatarHanna Czenczek <hreitz@redhat.com>
    Message-Id: <20230411173418.19549-3-hreitz@redhat.com>
    Reviewed-by: default avatarVladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Loading