Skip to content
Snippets Groups Projects
  1. Dec 18, 2020
  2. Dec 11, 2020
    • Kevin Wolf's avatar
      block: Fix deadlock in bdrv_co_yield_to_drain() · 960d5fb3
      Kevin Wolf authored
      If bdrv_co_yield_to_drain() is called for draining a block node that
      runs in a different AioContext, it keeps that AioContext locked while it
      yields and schedules a BH in the AioContext to do the actual drain.
      
      As long as executing the BH is the very next thing that the event loop
      of the node's AioContext does, this actually happens to work, but when
      it tries to execute something else that wants to take the AioContext
      lock, it will deadlock. (In the bug report, this other thing is a
      virtio-scsi device running virtio_scsi_data_plane_handle_cmd().)
      
      Instead, always drop the AioContext lock across the yield and reacquire
      it only when the coroutine is reentered. The BH needs to unconditionally
      take the lock for itself now.
      
      This fixes the 'block_resize' QMP command on a block node that runs in
      an iothread.
      
      Cc: qemu-stable@nongnu.org
      Fixes: eb94b81a
      Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1903511
      
      
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      Message-Id: <20201203172311.68232-4-kwolf@redhat.com>
      Reviewed-by: default avatarVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      960d5fb3
    • Vladimir Sementsov-Ogievskiy's avatar
      block: introduce BDRV_MAX_LENGTH · 8b117001
      Vladimir Sementsov-Ogievskiy authored
      
      We are going to modify block layer to work with 64bit requests. And
      first step is moving to int64_t type for both offset and bytes
      arguments in all block request related functions.
      
      It's mostly safe (when widening signed or unsigned int to int64_t), but
      switching from uint64_t is questionable.
      
      So, let's first establish the set of requests we want to work with.
      First signed int64_t should be enough, as off_t is signed anyway. Then,
      obviously offset + bytes should not overflow.
      
      And most interesting: (offset + bytes) being aligned up should not
      overflow as well. Aligned to what alignment? First thing that comes in
      mind is bs->bl.request_alignment, as we align up request to this
      alignment. But there is another thing: look at
      bdrv_mark_request_serialising(). It aligns request up to some given
      alignment. And this parameter may be bdrv_get_cluster_size(), which is
      often a lot greater than bs->bl.request_alignment.
      Note also, that bdrv_mark_request_serialising() uses signed int64_t for
      calculations. So, actually, we already depend on some restrictions.
      
      Happily, bdrv_get_cluster_size() returns int and
      bs->bl.request_alignment has 32bit unsigned type, but defined to be a
      power of 2 less than INT_MAX. So, we may establish, that INT_MAX is
      absolute maximum for any kind of alignment that may occur with the
      request.
      
      Note, that bdrv_get_cluster_size() is not documented to return power
      of 2, still bdrv_mark_request_serialising() behaves like it is.
      Also, backup uses bdi.cluster_size and is not prepared to it not being
      power of 2.
      So, let's establish that Qemu supports only power-of-2 clusters and
      alignments.
      
      So, alignment can't be greater than 2^30.
      
      Finally to be safe with calculations, to not calculate different
      maximums for different nodes (depending on cluster size and
      request_alignment), let's simply set QEMU_ALIGN_DOWN(INT64_MAX, 2^30)
      as absolute maximum bytes length for Qemu. Actually, it's not much less
      than INT64_MAX.
      
      OK, then, let's apply it to block/io.
      
      Let's consider all block/io entry points of offset/bytes:
      
      4 bytes/offset interface functions: bdrv_co_preadv_part(),
      bdrv_co_pwritev_part(), bdrv_co_copy_range_internal() and
      bdrv_co_pdiscard() and we check them all with bdrv_check_request().
      
      We also have one entry point with only offset: bdrv_co_truncate().
      Check the offset.
      
      And one public structure: BdrvTrackedRequest. Happily, it has only
      three external users:
      
       file-posix.c: adopted by this patch
       write-threshold.c: only read fields
       test-write-threshold.c: sets obviously small constant values
      
      Better is to make the structure private and add corresponding
      interfaces.. Still it's not obvious what kind of interface is needed
      for file-posix.c. Let's keep it public but add corresponding
      assertions.
      
      After this patch we'll convert functions in block/io.c to int64_t bytes
      and offset parameters. We can assume that offset/bytes pair always
      satisfy new restrictions, and make
      corresponding assertions where needed. If we reach some offset/bytes
      point in block/io.c missing bdrv_check_request() it is considered a
      bug. As well, if block/io.c modifies a offset/bytes request, expanding
      it more then aligning up to request_alignment, it's a bug too.
      
      For all io requests except for discard we keep for now old restriction
      of 32bit request length.
      
      iotest 206 output error message changed, as now test disk size is
      larger than new limit. Add one more test case with new maximum disk
      size to cover too-big-L1 case.
      
      Signed-off-by: default avatarVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      Message-Id: <20201203222713.13507-5-vsementsov@virtuozzo.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      8b117001
    • Vladimir Sementsov-Ogievskiy's avatar
      block/io: bdrv_check_byte_request(): drop bdrv_is_inserted() · f4dad307
      Vladimir Sementsov-Ogievskiy authored
      
      Move bdrv_is_inserted() calls into callers.
      
      We are going to make bdrv_check_byte_request() a clean thing.
      bdrv_is_inserted() is not about checking the request, it's about
      checking the bs. So, it should be separate.
      
      With this patch we probably change error path for some failure
      scenarios. But depending on the fact that querying too big request on
      empty cdrom (or corrupted qcow2 node with no drv) will result in EIO
      and not ENOMEDIUM would be very strange. More over, we are going to
      move to 64bit requests, so larger requests will be allowed anyway.
      
      More over, keeping in mind that cdrom is the only driver that has
      .bdrv_is_inserted() handler it's strange that we should care so much
      about it in generic block layer, intuitively we should just do read and
      write, and cdrom driver should return correct errors if it is not
      inserted. But it's a work for another series.
      
      Signed-off-by: default avatarVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      Message-Id: <20201203222713.13507-4-vsementsov@virtuozzo.com>
      Reviewed-by: default avatarAlberto Garcia <berto@igalia.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      f4dad307
    • Vladimir Sementsov-Ogievskiy's avatar
      block/io: bdrv_refresh_limits(): use ERRP_GUARD · 33985614
      Vladimir Sementsov-Ogievskiy authored
      
      This simplifies following commit.
      
      Signed-off-by: default avatarVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      Message-Id: <20201203222713.13507-3-vsementsov@virtuozzo.com>
      Reviewed-by: default avatarAlberto Garcia <berto@igalia.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      33985614
  3. Oct 30, 2020
  4. Oct 27, 2020
    • Greg Kurz's avatar
      block: End quiescent sections when a BDS is deleted · 1a6d3bd2
      Greg Kurz authored
      If a BDS gets deleted during blk_drain_all(), it might miss a
      call to bdrv_do_drained_end(). This means missing a call to
      aio_enable_external() and the AIO context remains disabled for
      ever. This can cause a device to become irresponsive and to
      disrupt the guest execution, ie. hang, loop forever or worse.
      
      This scenario is quite easy to encounter with virtio-scsi
      on POWER when punching multiple blockdev-create QMP commands
      while the guest is booting and it is still running the SLOF
      firmware. This happens because SLOF disables/re-enables PCI
      devices multiple times via IO/MEM/MASTER bits of PCI_COMMAND
      register after the initial probe/feature negotiation, as it
      tends to work with a single device at a time at various stages
      like probing and running block/network bootloaders without
      doing a full reset in-between. This naturally generates many
      dataplane stops and starts, and thus many drain sections that
      can race with blockdev_create_run(). In the end, SLOF bails
      out.
      
      It is somehow reproducible on x86 but it requires to generate
      articial dataplane start/stop activity with stop/cont QMP
      commands. In this case, seabios ends up looping for ever,
      waiting for the virtio-scsi device to send a response to
      a command it never received.
      
      Add a helper that pairs all previously called bdrv_do_drained_begin()
      with a bdrv_do_drained_end() and call it from bdrv_close().
      While at it, update the "/bdrv-drain/graph-change/drain_all"
      test in test-bdrv-drain so that it can catch the issue.
      
      BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1874441
      
      
      Signed-off-by: default avatarGreg Kurz <groug@kaod.org>
      Message-Id: <160346526998.272601.9045392804399803158.stgit@bahia.lan>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      1a6d3bd2
    • Alberto Garcia's avatar
      qcow2: Skip copy-on-write when allocating a zero cluster · 46cd1e8a
      Alberto Garcia authored
      
      Since commit c8bb23cb when a write
      request results in a new allocation QEMU first tries to see if the
      rest of the cluster outside the written area contains only zeroes.
      
      In that case, instead of doing a normal copy-on-write operation and
      writing explicit zero buffers to disk, the code zeroes the whole
      cluster efficiently using pwrite_zeroes() with BDRV_REQ_NO_FALLBACK.
      
      This improves performance very significantly but it only happens when
      we are writing to an area that was completely unallocated before. Zero
      clusters (QCOW2_CLUSTER_ZERO_*) are treated like normal clusters and
      are therefore slower to allocate.
      
      This happens because the code uses bdrv_is_allocated_above() rather
      bdrv_block_status_above(). The former is not as accurate for this
      purpose but it is faster. However in the case of qcow2 the underlying
      call does already report zero clusters just fine so there is no reason
      why we cannot use that information.
      
      After testing 4KB writes on an image that only contains zero clusters
      this patch results in almost five times more IOPS.
      
      Signed-off-by: default avatarAlberto Garcia <berto@igalia.com>
      Reviewed-by: default avatarVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      Message-Id: <6d77cab968c501c44d6e1089b9bc91b04170b49e.1603731354.git.berto@igalia.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      46cd1e8a
    • Alberto Garcia's avatar
      qcow2: Report BDRV_BLOCK_ZERO more accurately in bdrv_co_block_status() · d40f4a56
      Alberto Garcia authored
      
      If a BlockDriverState supports backing files but has none then any
      unallocated area reads back as zeroes.
      
      bdrv_co_block_status() is only reporting this is if want_zero is true,
      but this is an inexpensive test and there is no reason not to do it in
      all cases.
      
      Suggested-by: default avatarVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      Signed-off-by: default avatarAlberto Garcia <berto@igalia.com>
      Reviewed-by: default avatarVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      Message-Id: <66fa0914a0e2b727ab6d1b63ca773d7cd29a9a9e.1603731354.git.berto@igalia.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      d40f4a56
  5. Oct 23, 2020
  6. Oct 05, 2020
  7. Sep 23, 2020
    • Stefan Hajnoczi's avatar
      qemu/atomic.h: rename atomic_ to qatomic_ · d73415a3
      Stefan Hajnoczi authored
      
      clang's C11 atomic_fetch_*() functions only take a C11 atomic type
      pointer argument. QEMU uses direct types (int, etc) and this causes a
      compiler error when a QEMU code calls these functions in a source file
      that also included <stdatomic.h> via a system header file:
      
        $ CC=clang CXX=clang++ ./configure ... && make
        ../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int *' invalid)
      
      Avoid using atomic_*() names in QEMU's atomic.h since that namespace is
      used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h
      and <stdatomic.h> can co-exist. I checked /usr/include on my machine and
      searched GitHub for existing "qatomic_" users but there seem to be none.
      
      This patch was generated using:
      
        $ git grep -h -o '\<atomic\(64\)\?_[a-z0-9_]\+' include/qemu/atomic.h | \
          sort -u >/tmp/changed_identifiers
        $ for identifier in $(</tmp/changed_identifiers); do
              sed -i "s%\<$identifier\>%q$identifier%g" \
                  $(git grep -I -l "\<$identifier\>")
          done
      
      I manually fixed line-wrap issues and misaligned rST tables.
      
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Reviewed-by: default avatarPhilippe Mathieu-Daudé <philmd@redhat.com>
      Acked-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <20200923105646.47864-1-stefanha@redhat.com>
      d73415a3
  8. Sep 07, 2020
  9. Jul 28, 2020
  10. Jul 06, 2020
  11. Jun 05, 2020
  12. May 18, 2020
  13. May 05, 2020
  14. Apr 30, 2020
  15. Mar 16, 2020
Loading