Skip to content
Snippets Groups Projects
  1. Sep 29, 2021
    • Vladimir Sementsov-Ogievskiy's avatar
      block: use int64_t instead of int in driver discard handlers · 0c802287
      Vladimir Sementsov-Ogievskiy authored
      
      We are generally moving to int64_t for both offset and bytes parameters
      on all io paths.
      
      Main motivation is realization of 64-bit write_zeroes operation for
      fast zeroing large disk chunks, up to the whole disk.
      
      We chose signed type, to be consistent with off_t (which is signed) and
      with possibility for signed return type (where negative value means
      error).
      
      So, convert driver discard handlers bytes parameter to int64_t.
      
      The only caller of all updated function is bdrv_co_pdiscard in
      block/io.c. It is already prepared to work with 64bit requests, but
      pass at most max(bs->bl.max_pdiscard, INT_MAX) to the driver.
      
      Let's look at all updated functions:
      
      blkdebug: all calculations are still OK, thanks to
        bdrv_check_qiov_request().
        both rule_check and bdrv_co_pdiscard are 64bit
      
      blklogwrites: pass to blk_loc_writes_co_log which is 64bit
      
      blkreplay, copy-on-read, filter-compress: pass to bdrv_co_pdiscard, OK
      
      copy-before-write: pass to bdrv_co_pdiscard which is 64bit and to
        cbw_do_copy_before_write which is 64bit
      
      file-posix: one handler calls raw_account_discard() is 64bit and both
        handlers calls raw_do_pdiscard(). Update raw_do_pdiscard, which pass
        to RawPosixAIOData::aio_nbytes, which is 64bit (and calls
        raw_account_discard())
      
      gluster: somehow, third argument of glfs_discard_async is size_t.
        Let's set max_pdiscard accordingly.
      
      iscsi: iscsi_allocmap_set_invalid is 64bit,
        !is_byte_request_lun_aligned is 64bit.
        list.num is uint32_t. Let's clarify max_pdiscard and
        pdiscard_alignment.
      
      mirror_top: pass to bdrv_mirror_top_do_write() which is
        64bit
      
      nbd: protocol limitation. max_pdiscard is alredy set strict enough,
        keep it as is for now.
      
      nvme: buf.nlb is uint32_t and we do shift. So, add corresponding limits
        to nvme_refresh_limits().
      
      preallocate: pass to bdrv_co_pdiscard() which is 64bit.
      
      rbd: pass to qemu_rbd_start_co() which is 64bit.
      
      qcow2: calculations are still OK, thanks to bdrv_check_qiov_request(),
        qcow2_cluster_discard() is 64bit.
      
      raw-format: raw_adjust_offset() is 64bit, bdrv_co_pdiscard too.
      
      throttle: pass to bdrv_co_pdiscard() which is 64bit and to
        throttle_group_co_io_limits_intercept() which is 64bit as well.
      
      test-block-iothread: bytes argument is unused
      
      Great! Now all drivers are prepared to handle 64bit discard requests,
      or else have explicit max_pdiscard limits.
      
      Signed-off-by: default avatarVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      Message-Id: <20210903102807.27127-11-vsementsov@virtuozzo.com>
      Reviewed-by: default avatarEric Blake <eblake@redhat.com>
      Signed-off-by: default avatarEric Blake <eblake@redhat.com>
      0c802287
    • Vladimir Sementsov-Ogievskiy's avatar
      block: use int64_t instead of int in driver write_zeroes handlers · f34b2bcf
      Vladimir Sementsov-Ogievskiy authored
      
      We are generally moving to int64_t for both offset and bytes parameters
      on all io paths.
      
      Main motivation is realization of 64-bit write_zeroes operation for
      fast zeroing large disk chunks, up to the whole disk.
      
      We chose signed type, to be consistent with off_t (which is signed) and
      with possibility for signed return type (where negative value means
      error).
      
      So, convert driver write_zeroes handlers bytes parameter to int64_t.
      
      The only caller of all updated function is bdrv_co_do_pwrite_zeroes().
      
      bdrv_co_do_pwrite_zeroes() itself is of course OK with widening of
      callee parameter type. Also, bdrv_co_do_pwrite_zeroes()'s
      max_write_zeroes is limited to INT_MAX. So, updated functions all are
      safe, they will not get "bytes" larger than before.
      
      Still, let's look through all updated functions, and add assertions to
      the ones which are actually unprepared to values larger than INT_MAX.
      For these drivers also set explicit max_pwrite_zeroes limit.
      
      Let's go:
      
      blkdebug: calculations can't overflow, thanks to
        bdrv_check_qiov_request() in generic layer. rule_check() and
        bdrv_co_pwrite_zeroes() both have 64bit argument.
      
      blklogwrites: pass to blk_log_writes_co_log() with 64bit argument.
      
      blkreplay, copy-on-read, filter-compress: pass to
        bdrv_co_pwrite_zeroes() which is OK
      
      copy-before-write: Calls cbw_do_copy_before_write() and
        bdrv_co_pwrite_zeroes, both have 64bit argument.
      
      file-posix: both handler calls raw_do_pwrite_zeroes, which is updated.
        In raw_do_pwrite_zeroes() calculations are OK due to
        bdrv_check_qiov_request(), bytes go to RawPosixAIOData::aio_nbytes
        which is uint64_t.
        Check also where that uint64_t gets handed:
        handle_aiocb_write_zeroes_block() passes a uint64_t[2] to
        ioctl(BLKZEROOUT), handle_aiocb_write_zeroes() calls do_fallocate()
        which takes off_t (and we compile to always have 64-bit off_t), as
        does handle_aiocb_write_zeroes_unmap. All look safe.
      
      gluster: bytes go to GlusterAIOCB::size which is int64_t and to
        glfs_zerofill_async works with off_t.
      
      iscsi: Aha, here we deal with iscsi_writesame16_task() that has
        uint32_t num_blocks argument and iscsi_writesame16_task() has
        uint16_t argument. Make comments, add assertions and clarify
        max_pwrite_zeroes calculation.
        iscsi_allocmap_() functions already has int64_t argument
        is_byte_request_lun_aligned is simple to update, do it.
      
      mirror_top: pass to bdrv_mirror_top_do_write which has uint64_t
        argument
      
      nbd: Aha, here we have protocol limitation, and NBDRequest::len is
        uint32_t. max_pwrite_zeroes is cleanly set to 32bit value, so we are
        OK for now.
      
      nvme: Again, protocol limitation. And no inherent limit for
        write-zeroes at all. But from code that calculates cdw12 it's obvious
        that we do have limit and alignment. Let's clarify it. Also,
        obviously the code is not prepared to handle bytes=0. Let's handle
        this case too.
        trace events already 64bit
      
      preallocate: pass to handle_write() and bdrv_co_pwrite_zeroes(), both
        64bit.
      
      rbd: pass to qemu_rbd_start_co() which is 64bit.
      
      qcow2: offset + bytes and alignment still works good (thanks to
        bdrv_check_qiov_request()), so tail calculation is OK
        qcow2_subcluster_zeroize() has 64bit argument, should be OK
        trace events updated
      
      qed: qed_co_request wants int nb_sectors. Also in code we have size_t
        used for request length which may be 32bit. So, let's just keep
        INT_MAX as a limit (aligning it down to pwrite_zeroes_alignment) and
        don't care.
      
      raw-format: Is OK. raw_adjust_offset and bdrv_co_pwrite_zeroes are both
        64bit.
      
      throttle: Both throttle_group_co_io_limits_intercept() and
        bdrv_co_pwrite_zeroes() are 64bit.
      
      vmdk: pass to vmdk_pwritev which is 64bit
      
      quorum: pass to quorum_co_pwritev() which is 64bit
      
      Hooray!
      
      At this point all block drivers are prepared to support 64bit
      write-zero requests, or have explicitly set max_pwrite_zeroes.
      
      Signed-off-by: default avatarVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      Message-Id: <20210903102807.27127-8-vsementsov@virtuozzo.com>
      Reviewed-by: default avatarEric Blake <eblake@redhat.com>
      [eblake: use <= rather than < in assertions relying on max_pwrite_zeroes]
      Signed-off-by: default avatarEric Blake <eblake@redhat.com>
      f34b2bcf
  2. Sep 15, 2021
  3. Jan 28, 2021
  4. Dec 19, 2020
    • Eric Blake's avatar
      qapi: Use QAPI_LIST_PREPEND() where possible · 54aa3de7
      Eric Blake authored
      
      Anywhere we create a list of just one item or by prepending items
      (typically because order doesn't matter), we can use
      QAPI_LIST_PREPEND().  But places where we must keep the list in order
      by appending remain open-coded until later patches.
      
      Note that as a side effect, this also performs a cleanup of two minor
      issues in qga/commands-posix.c: the old code was performing
       new = g_malloc0(sizeof(*ret));
      which 1) is confusing because you have to verify whether 'new' and
      'ret' are variables with the same type, and 2) would conflict with C++
      compilation (not an actual problem for this file, but makes
      copy-and-paste harder).
      
      Signed-off-by: default avatarEric Blake <eblake@redhat.com>
      Message-Id: <20201113011340.463563-5-eblake@redhat.com>
      Reviewed-by: default avatarMarkus Armbruster <armbru@redhat.com>
      Acked-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      [Straightforward conflicts due to commit a8aa94b5 "qga: update
      schema for guest-get-disks 'dependents' field" and commit a10b453a
      "target/mips: Move mips_cpu_add_definition() from helper.c to cpu.c"
      resolved.  Commit message tweaked.]
      Signed-off-by: default avatarMarkus Armbruster <armbru@redhat.com>
      54aa3de7
  5. Jul 10, 2020
    • Markus Armbruster's avatar
      error: Reduce unnecessary error propagation · a5f9b9df
      Markus Armbruster authored
      
      When all we do with an Error we receive into a local variable is
      propagating to somewhere else, we can just as well receive it there
      right away, even when we need to keep error_propagate() for other
      error paths.
      
      Signed-off-by: default avatarMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: default avatarEric Blake <eblake@redhat.com>
      Message-Id: <20200707160613.848843-38-armbru@redhat.com>
      a5f9b9df
    • Markus Armbruster's avatar
      error: Eliminate error_propagate() with Coccinelle, part 2 · af175e85
      Markus Armbruster authored
      
      When all we do with an Error we receive into a local variable is
      propagating to somewhere else, we can just as well receive it there
      right away.  The previous commit did that with a Coccinelle script I
      consider fairly trustworthy.  This commit uses the same script with
      the matching of return taken out, i.e. we convert
      
          if (!foo(..., &err)) {
              ...
              error_propagate(errp, err);
              ...
          }
      
      to
      
          if (!foo(..., errp)) {
              ...
              ...
          }
      
      This is unsound: @err could still be read between afterwards.  I don't
      know how to express "no read of @err without an intervening write" in
      Coccinelle.  Instead, I manually double-checked for uses of @err.
      
      Suboptimal line breaks tweaked manually.  qdev_realize() simplified
      further to placate scripts/checkpatch.pl.
      
      Signed-off-by: default avatarMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: default avatarEric Blake <eblake@redhat.com>
      Message-Id: <20200707160613.848843-36-armbru@redhat.com>
      af175e85
    • Markus Armbruster's avatar
      qemu-option: Use returned bool to check for failure · 235e59cf
      Markus Armbruster authored
      
      The previous commit enables conversion of
      
          foo(..., &err);
          if (err) {
              ...
          }
      
      to
      
          if (!foo(..., &err)) {
              ...
          }
      
      for QemuOpts functions that now return true / false on success /
      error.  Coccinelle script:
      
          @@
          identifier fun = {
              opts_do_parse, parse_option_bool, parse_option_number,
              parse_option_size, qemu_opt_parse, qemu_opt_rename, qemu_opt_set,
              qemu_opt_set_bool, qemu_opt_set_number, qemu_opts_absorb_qdict,
              qemu_opts_do_parse, qemu_opts_from_qdict_entry, qemu_opts_set,
              qemu_opts_validate
          };
          expression list args, args2;
          typedef Error;
          Error *err;
          @@
          -    fun(args, &err, args2);
          -    if (err)
          +    if (!fun(args, &err, args2))
               {
                   ...
               }
      
      A few line breaks tidied up manually.
      
      Signed-off-by: default avatarMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: default avatarEric Blake <eblake@redhat.com>
      Reviewed-by: default avatarVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      Message-Id: <20200707160613.848843-15-armbru@redhat.com>
      [Conflict with commit 0b6786a9 "block/amend: refactor qcow2 amend
      options" resolved by rerunning Coccinelle on master's version]
      235e59cf
  6. May 08, 2020
  7. Apr 30, 2020
  8. Mar 26, 2020
  9. Oct 28, 2019
    • Hanna Reitz's avatar
      block: Add @exact parameter to bdrv_co_truncate() · c80d8b06
      Hanna Reitz authored
      
      We have two drivers (iscsi and file-posix) that (in some cases) return
      success from their .bdrv_co_truncate() implementation if the block
      device is larger than the requested offset, but cannot be shrunk.  Some
      callers do not want that behavior, so this patch adds a new parameter
      that they can use to turn off that behavior.
      
      This patch just adds the parameter and lets the block/io.c and
      block/block-backend.c functions pass it around.  All other callers
      always pass false and none of the implementations evaluate it, so that
      this patch does not change existing behavior.  Future patches take care
      of that.
      
      Suggested-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: default avatarMax Reitz <mreitz@redhat.com>
      Message-id: 20190918095144.955-5-mreitz@redhat.com
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: default avatarMax Reitz <mreitz@redhat.com>
      c80d8b06
  10. Aug 19, 2019
  11. Jul 15, 2019
  12. Jun 12, 2019
  13. Apr 02, 2019
  14. Mar 12, 2019
  15. Feb 25, 2019
    • Hanna Reitz's avatar
      block: Add strong_runtime_opts to BlockDriver · 2654267c
      Hanna Reitz authored
      
      This new field can be set by block drivers to list the runtime options
      they accept that may influence the contents of the respective BDS. As of
      a follow-up patch, this list will be used by the common
      bdrv_refresh_filename() implementation to decide which options to put
      into BDS.full_open_options (and consequently whether a JSON filename has
      to be created), thus freeing the drivers of having to implement that
      logic themselves.
      
      Additionally, this patch adds the field to all of the block drivers that
      need it and sets it accordingly.
      
      Signed-off-by: default avatarMax Reitz <mreitz@redhat.com>
      Reviewed-by: default avatarAlberto Garcia <berto@igalia.com>
      Message-id: 20190201192935.18394-22-mreitz@redhat.com
      Signed-off-by: default avatarMax Reitz <mreitz@redhat.com>
      2654267c
  16. Jan 11, 2019
    • Paolo Bonzini's avatar
      qemu/queue.h: leave head structs anonymous unless necessary · b58deb34
      Paolo Bonzini authored
      
      Most list head structs need not be given a name.  In most cases the
      name is given just in case one is going to use QTAILQ_LAST, QTAILQ_PREV
      or reverse iteration, but this does not apply to lists of other kinds,
      and even for QTAILQ in practice this is only rarely needed.  In addition,
      we will soon reimplement those macros completely so that they do not
      need a name for the head struct.  So clean up everything, not giving a
      name except in the rare case where it is necessary.
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b58deb34
  17. Nov 05, 2018
  18. Jul 23, 2018
  19. Jun 29, 2018
    • Kevin Wolf's avatar
      block: Convert .bdrv_truncate callback to coroutine_fn · 061ca8a3
      Kevin Wolf authored
      
      bdrv_truncate() is an operation that can block (even for a quite long
      time, depending on the PreallocMode) in I/O paths that shouldn't block.
      Convert it to a coroutine_fn so that we have the infrastructure for
      drivers to make their .bdrv_co_truncate implementation asynchronous.
      
      This change could potentially introduce new race conditions because
      bdrv_truncate() isn't necessarily executed atomically any more. Whether
      this is a problem needs to be evaluated for each block driver that
      supports truncate:
      
      * file-posix/win32, gluster, iscsi, nfs, rbd, ssh, sheepdog: The
        protocol drivers are trivially safe because they don't actually yield
        yet, so there is no change in behaviour.
      
      * copy-on-read, crypto, raw-format: Essentially just filter drivers that
        pass the request to a child node, no problem.
      
      * qcow2: The implementation modifies metadata, so it needs to hold
        s->lock to be safe with concurrent I/O requests. In order to avoid
        double locking, this requires pulling the locking out into
        preallocate_co() and using qcow2_write_caches() instead of
        bdrv_flush().
      
      * qed: Does a single header update, this is fine without locking.
      
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      Reviewed-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      061ca8a3
  20. Jun 15, 2018
    • Hanna Reitz's avatar
      block: Add block-specific QDict header · 609f45ea
      Hanna Reitz authored
      
      There are numerous QDict functions that have been introduced for and are
      used only by the block layer.  Move their declarations into an own
      header file to reflect that.
      
      While qdict_extract_subqdict() is in fact used outside of the block
      layer (in util/qemu-config.c), it is still a function related very
      closely to how the block layer works with nested QDicts, namely by
      sometimes flattening them.  Therefore, its declaration is put into this
      header as well and util/qemu-config.c includes it with a comment stating
      exactly which function it needs.
      
      Suggested-by: default avatarMarkus Armbruster <armbru@redhat.com>
      Signed-off-by: default avatarMax Reitz <mreitz@redhat.com>
      Message-Id: <20180509165530.29561-7-mreitz@redhat.com>
      [Copyright note tweaked, superfluous includes dropped]
      Signed-off-by: default avatarMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: default avatarKevin Wolf <kwolf@redhat.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      609f45ea
  21. May 15, 2018
    • Eric Blake's avatar
      block: Merge .bdrv_co_writev{,_flags} in drivers · e18a58b4
      Eric Blake authored
      
      We have too many driver callback interfaces; simplify the mess
      somewhat by merging the flags parameter of .bdrv_co_writev_flags()
      into .bdrv_co_writev().  Note that as long as a driver doesn't set
      .supported_write_flags, the flags argument will be 0 and behavior is
      identical.  Also note that the public function bdrv_co_writev() still
      lacks a flags argument; so the driver signature is thus intentionally
      slightly different.  But that's not the end of the world, nor the first
      time that the driver interface differs slightly from the public
      interface.
      
      Ideally, we should be rewriting all of these drivers to use modern
      byte-based interfaces.  But that's a more invasive patch to write
      and audit, compared to the simplification done here.
      
      Signed-off-by: default avatarEric Blake <eblake@redhat.com>
      Reviewed-by: default avatarDaniel P. Berrangé <berrange@redhat.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      e18a58b4
  22. May 04, 2018
  23. Apr 03, 2018
  24. Mar 13, 2018
    • Daniel P. Berrangé's avatar
      block: include original filename when reporting invalid URIs · 44acd46f
      Daniel P. Berrangé authored
      
      Consider passing a JSON based block driver to "qemu-img commit"
      
      $ qemu-img commit 'json:{"driver":"qcow2","file":{"driver":"gluster",\
                        "volume":"gv0","path":"sn1.qcow2",
                        "server":[{"type":\
      		  "tcp","host":"10.73.199.197","port":"24007"}]},}'
      
      Currently it will commit the content and then report an incredibly
      useless error message when trying to re-open the committed image:
      
        qemu-img: invalid URI
        Usage: file=gluster[+transport]://[host[:port]]volume/path[?socket=...][,file.debug=N][,file.logfile=/path/filename.log]
      
      With this fix we get:
      
        qemu-img: invalid URI json:{"server.0.host": "10.73.199.197",
            "driver": "gluster", "path": "luks.qcow2", "server.0.type":
            "tcp", "server.0.port": "24007", "volume": "gv0"}
      
      Of course the root cause problem still exists, but now we know
      what actually needs fixing.
      
      Signed-off-by: default avatarDaniel P. Berrangé <berrange@redhat.com>
      Reviewed-by: default avatarEric Blake <eblake@redhat.com>
      Message-id: 20180206105204.14817-1-berrange@redhat.com
      Signed-off-by: default avatarJeff Cody <jcody@redhat.com>
      44acd46f
  25. Mar 09, 2018
  26. Mar 02, 2018
  27. Feb 13, 2018
  28. Feb 09, 2018
Loading