Skip to content
Snippets Groups Projects
  1. Sep 20, 2023
  2. Sep 16, 2023
  3. Sep 15, 2023
  4. Sep 08, 2023
    • Philippe Mathieu-Daudé's avatar
      util/iov: Avoid dynamic stack allocation · 522a9b94
      Philippe Mathieu-Daudé authored
      
      Use autofree heap allocation instead of variable-length array on the
      stack.
      
      The codebase has very few VLAs, and if we can get rid of them all we
      can make the compiler error on new additions.  This is a defensive
      measure against security bugs where an on-stack dynamic allocation
      isn't correctly size-checked (e.g.  CVE-2021-3527).
      
      Signed-off-by: default avatarPhilippe Mathieu-Daudé <philmd@redhat.com>
      Signed-off-by: default avatarPeter Maydell <peter.maydell@linaro.org>
      Message-ID: <20230824164706.2652277-1-peter.maydell@linaro.org>
      Reviewed-by: default avatarEric Blake <eblake@redhat.com>
      Signed-off-by: default avatarEric Blake <eblake@redhat.com>
      522a9b94
    • Stefan Hajnoczi's avatar
      io: follow coroutine AioContext in qio_channel_yield() · 06e0f098
      Stefan Hajnoczi authored
      
      The ongoing QEMU multi-queue block layer effort makes it possible for multiple
      threads to process I/O in parallel. The nbd block driver is not compatible with
      the multi-queue block layer yet because QIOChannel cannot be used easily from
      coroutines running in multiple threads. This series changes the QIOChannel API
      to make that possible.
      
      In the current API, calling qio_channel_attach_aio_context() sets the
      AioContext where qio_channel_yield() installs an fd handler prior to yielding:
      
        qio_channel_attach_aio_context(ioc, my_ctx);
        ...
        qio_channel_yield(ioc); // my_ctx is used here
        ...
        qio_channel_detach_aio_context(ioc);
      
      This API design has limitations: reading and writing must be done in the same
      AioContext and moving between AioContexts involves a cumbersome sequence of API
      calls that is not suitable for doing on a per-request basis.
      
      There is no fundamental reason why a QIOChannel needs to run within the
      same AioContext every time qio_channel_yield() is called. QIOChannel
      only uses the AioContext while inside qio_channel_yield(). The rest of
      the time, QIOChannel is independent of any AioContext.
      
      In the new API, qio_channel_yield() queries the AioContext from the current
      coroutine using qemu_coroutine_get_aio_context(). There is no need to
      explicitly attach/detach AioContexts anymore and
      qio_channel_attach_aio_context() and qio_channel_detach_aio_context() are gone.
      One coroutine can read from the QIOChannel while another coroutine writes from
      a different AioContext.
      
      This API change allows the nbd block driver to use QIOChannel from any thread.
      It's important to keep in mind that the block driver already synchronizes
      QIOChannel access and ensures that two coroutines never read simultaneously or
      write simultaneously.
      
      This patch updates all users of qio_channel_attach_aio_context() to the
      new API. Most conversions are simple, but vhost-user-server requires a
      new qemu_coroutine_yield() call to quiesce the vu_client_trip()
      coroutine when not attached to any AioContext.
      
      While the API is has become simpler, there is one wart: QIOChannel has a
      special case for the iohandler AioContext (used for handlers that must not run
      in nested event loops). I didn't find an elegant way preserve that behavior, so
      I added a new API called qio_channel_set_follow_coroutine_ctx(ioc, true|false)
      for opting in to the new AioContext model. By default QIOChannel uses the
      iohandler AioHandler. Code that formerly called
      qio_channel_attach_aio_context() now calls
      qio_channel_set_follow_coroutine_ctx(ioc, true) once after the QIOChannel is
      created.
      
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Reviewed-by: default avatarEric Blake <eblake@redhat.com>
      Acked-by: default avatarDaniel P. Berrangé <berrange@redhat.com>
      Message-ID: <20230830224802.493686-5-stefanha@redhat.com>
      [eblake: also fix migration/rdma.c]
      Signed-off-by: default avatarEric Blake <eblake@redhat.com>
      06e0f098
  5. Sep 01, 2023
  6. Aug 31, 2023
  7. Aug 30, 2023
    • Stefan Hajnoczi's avatar
      aio-posix: zero out io_uring sqe user_data · 87ec6f55
      Stefan Hajnoczi authored
      
      liburing does not clear sqe->user_data. We must do it ourselves to avoid
      undefined behavior in process_cqe() when user_data is used.
      
      Note that fdmon-io_uring is currently disabled, so this is a latent bug
      that does not affect users. Let's merge this fix now to make it easier
      to enable fdmon-io_uring in the future (and I'm working on that).
      
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Message-ID: <20230426212639.82310-1-stefanha@redhat.com>
      87ec6f55
  8. Aug 29, 2023
  9. Aug 09, 2023
  10. Aug 08, 2023
  11. Aug 03, 2023
  12. Aug 01, 2023
    • Anthony PERARD's avatar
      thread-pool: signal "request_cond" while locked · f4f71363
      Anthony PERARD authored
      thread_pool_free() might have been called on the `pool`, which would
      be a reason for worker_thread() to quit. In this case,
      `pool->request_cond` is been destroyed.
      
      If worker_thread() didn't managed to signal `request_cond` before it
      been destroyed by thread_pool_free(), we got:
          util/qemu-thread-posix.c:198: qemu_cond_signal: Assertion `cond->initialized' failed.
      
      One backtrace:
          __GI___assert_fail (assertion=0x55555614abcb "cond->initialized", file=0x55555614ab88 "util/qemu-thread-posix.c", line=198,
      	function=0x55555614ad80 <__PRETTY_FUNCTION__.17104> "qemu_cond_signal") at assert.c:101
          qemu_cond_signal (cond=0x7fffb800db30) at util/qemu-thread-posix.c:198
          worker_thread (opaque=0x7fffb800dab0) at util/thread-pool.c:129
          qemu_thread_start (args=0x7fffb8000b20) at util/qemu-thread-posix.c:505
          start_thread (arg=<optimized out>) at pthread_create.c:486
      
      Reported here:
          https://lore.kernel.org/all/ZJwoK50FcnTSfFZ8@MacBook-Air-de-Roger.local/T/#u
      
      
      
      To avoid issue, keep lock while sending a signal to `request_cond`.
      
      Fixes: 900fa208 ("thread-pool: replace semaphore with condition variable")
      Signed-off-by: default avatarAnthony PERARD <anthony.perard@citrix.com>
      Reviewed-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Message-Id: <20230714152720.5077-1-anthony.perard@citrix.com>
      Signed-off-by: default avatarAnthony PERARD <anthony.perard@citrix.com>
      f4f71363
  13. Jul 31, 2023
  14. Jul 10, 2023
  15. Jul 08, 2023
  16. Jun 27, 2023
    • Marc-André Lureau's avatar
      console/win32: allocate shareable display surface · 09b4c198
      Marc-André Lureau authored
      
      Introduce qemu_win32_map_alloc() and qemu_win32_map_free() to allocate
      shared memory mapping. The handle can be used to share the mapping with
      another process.
      
      Teach qemu_create_displaysurface() to allocate shared memory. Following
      patches will introduce other places for shared memory allocation.
      
      Other patches for -display dbus will share the memory when possible with
      the client, to avoid expensive memory copy between the processes.
      
      Signed-off-by: default avatarMarc-André Lureau <marcandre.lureau@redhat.com>
      Message-Id: <20230606115658.677673-10-marcandre.lureau@redhat.com>
      09b4c198
  17. Jun 13, 2023
  18. Jun 06, 2023
    • Paolo Bonzini's avatar
      atomics: eliminate mb_read/mb_set · 06831001
      Paolo Bonzini authored
      
      qatomic_mb_read and qatomic_mb_set were the very first atomic primitives
      introduced for QEMU; their semantics are unclear and they provide a false
      sense of safety.
      
      The last use of qatomic_mb_read() has been removed, so delete it.
      qatomic_mb_set() instead can survive as an optimized
      qatomic_set()+smp_mb(), similar to Linux's smp_store_mb(), but
      rename it to qatomic_set_mb() to match the order of the two
      operations.
      
      Reviewed-by: default avatarRichard Henderson <richard.henderson@linaro.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      06831001
  19. Jun 05, 2023
    • Hanna Czenczek's avatar
      util/iov: Remove qemu_iovec_init_extended() · cc63f6f6
      Hanna Czenczek authored
      
      bdrv_pad_request() was the main user of qemu_iovec_init_extended().
      HEAD^ has removed that use, so we can remove qemu_iovec_init_extended()
      now.
      
      The only remaining user is qemu_iovec_init_slice(), which can easily
      inline the small part it really needs.
      
      Note that qemu_iovec_init_extended() offered a memcpy() optimization to
      initialize the new I/O vector.  qemu_iovec_concat_iov(), which is used
      to replace its functionality, does not, but calls qemu_iovec_add() for
      every single element.  If we decide this optimization was important, we
      will need to re-implement it in qemu_iovec_concat_iov(), which might
      also benefit its pre-existing users.
      
      Reviewed-by: default avatarEric Blake <eblake@redhat.com>
      Reviewed-by: default avatarVladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
      Signed-off-by: default avatarHanna Czenczek <hreitz@redhat.com>
      Message-Id: <20230411173418.19549-4-hreitz@redhat.com>
      cc63f6f6
    • Hanna Czenczek's avatar
      util/iov: Make qiov_slice() public · 3d06cea8
      Hanna Czenczek authored
      
      We want to inline qemu_iovec_init_extended() in block/io.c for padding
      requests, and having access to qiov_slice() is useful for this.  As a
      public function, it is renamed to qemu_iovec_slice().
      
      (We will need to count the number of I/O vector elements of a slice
      there, and then later process this slice.  Without qiov_slice(), we
      would need to call qemu_iovec_subvec_niov(), and all further
      IOV-processing functions may need to skip prefixing elements to
      accomodate for a qiov_offset.  Because qemu_iovec_subvec_niov()
      internally calls qiov_slice(), we can just have the block/io.c code call
      qiov_slice() itself, thus get the number of elements, and also create an
      iovec array with the superfluous prefixing elements stripped, so the
      following processing functions no longer need to skip them.)
      
      Reviewed-by: default avatarEric Blake <eblake@redhat.com>
      Reviewed-by: default avatarVladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
      Signed-off-by: default avatarHanna Czenczek <hreitz@redhat.com>
      Message-Id: <20230411173418.19549-2-hreitz@redhat.com>
      3d06cea8
  20. Jun 02, 2023
    • Eric Blake's avatar
      cutils: Improve qemu_strtosz handling of fractions · 42cc08d1
      Eric Blake authored
      We have several limitations and bugs worth fixing; they are
      inter-related enough that it is not worth splitting this patch into
      smaller pieces:
      
      * ".5k" should work to specify 512, just as "0.5k" does
      * "1.9999k" and "1." + "9"*50 + "k" should both produce the same
        result of 2048 after rounding
      * "1." + "0"*350 + "1B" should not be treated the same as "1.0B";
        underflow in the fraction should not be lost
      * "7.99e99" and "7.99e999" look similar, but our code was doing a
        read-out-of-bounds on the latter because it was not expecting ERANGE
        due to overflow. While we document that scientific notation is not
        supported, and the previous patch actually fixed
        qemu_strtod_finite() to no longer return ERANGE overflows, it is
        easier to pre-filter than to try and determine after the fact if
        strtod() consumed more than we wanted.  Note that this is a
        low-level semantic change (when endptr is not NULL, we can now
        successfully parse with a scale of 'E' and then report trailing
        junk, instead of failing outright with EINVAL); but an earlier
        commit already argued that this is not a high-level semantic change
        since the only caller passing in a non-NULL endptr also checks that
        the tail is whitespace-only.
      
      Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1629
      
      
      Fixes: cf923b78 ("utils: Improve qemu_strtosz() to have 64 bits of precision", 6.0.0)
      Fixes: 7625a1ed ("utils: Use fixed-point arithmetic in qemu_strtosz", 6.0.0)
      Signed-off-by: default avatarEric Blake <eblake@redhat.com>
      Reviewed-by: default avatarHanna Czenczek <hreitz@redhat.com>
      Message-Id: <20230522190441.64278-20-eblake@redhat.com>
      [eblake: tweak function comment for accuracy]
      42cc08d1
    • Eric Blake's avatar
      cutils: Improve qemu_strtod* error paths · c25b1683
      Eric Blake authored
      Previous patches changed all integral qemu_strto*() error paths to
      guarantee that *value is never left uninitialized.  Do likewise for
      qemu_strtod.  Also, tighten qemu_strtod_finite() to never return a
      non-finite value (prior to this patch, we were rejecting "inf" with
      -EINVAL and unspecified result 0.0, but failing "9e999" with -ERANGE
      and HUGE_VAL - which is infinite on IEEE machines - despite our
      function claiming to recognize only finite values).
      
      Auditing callers, we have no external callers of qemu_strtod, and
      among the callers of qemu_strtod_finite:
      
      - qapi/qobject-input-visitor.c:qobject_input_type_number_keyval() and
        qapi/string-input-visitor.c:parse_type_number() which reject all
        errors (does not matter what we store)
      
      - utils/cutils.c:do_strtosz() incorrectly assumes that *endptr points
        to '.' on all failures (that is, it is not distinguishing between
        EINVAL and ERANGE; and therefore still does the WRONG THING for
        "9.9e999".  The change here does not entirely fix that (a later
        patch will tackle this more systematically), but at least it fixes
        the read-out-of-bounds first diagnosed in
        https://gitlab.com/qemu-project/qemu/-/issues/1629
      
      
      
      - our testsuite, which we can update to match what we document
      
      Signed-off-by: default avatarEric Blake <eblake@redhat.com>
      Reviewed-by: default avatarHanna Czenczek <hreitz@redhat.com>
      CC: qemu-stable@nongnu.org
      Message-Id: <20230522190441.64278-19-eblake@redhat.com>
      c25b1683
    • Eric Blake's avatar
      cutils: Use parse_uint in qemu_strtosz for negative rejection · b87ac966
      Eric Blake authored
      
      Rather than open-coding two different ways to check for an unwanted
      negative sign, reuse the same code in both functions.  That way, if we
      decide down the road to accept "-0" instead of rejecting it, we have
      fewer places to change.  Also, it means we now get ERANGE instead of
      EINVAL for negative values in qemu_strtosz, which is reasonable for
      what it represents.  This in turn changes the expected output of a
      couple of iotests.
      
      The change is not quite complete: negative fractional scaled values
      can trip us up.  This will be fixed in a later patch addressing other
      issues with fractional scaled values.
      
      Signed-off-by: default avatarEric Blake <eblake@redhat.com>
      Reviewed-by: default avatarHanna Czenczek <hreitz@redhat.com>
      Message-Id: <20230522190441.64278-18-eblake@redhat.com>
      b87ac966
    • Eric Blake's avatar
      cutils: Set value in all integral qemu_strto* error paths · 3c5f2467
      Eric Blake authored
      
      Our goal in writing qemu_strtoi() and friends is to have an interface
      harder to abuse than libc's strtol().  Leaving the return value
      uninitialized on some but not all error paths does not lend itself
      well to this goal; and our documentation wasn't helpful on what to
      expect.
      
      Note that the previous patch changed all qemu_strtosz() EINVAL error
      paths to slam value to 0 rather than stay uninitialized, even when the
      EINVAL eror occurs because of trailing junk.  But for the remaining
      integral qemu_strto*, it's easier to return the parsed value than to
      force things back to zero, in part because of how check_strtox_error
      works; in part because people expect that from libc strto* (while
      there is no libc strtosz to compare to), and in part because doing so
      creates less churn in the testsuite.
      
      Here, the list of affected callers is much longer ('git grep
      "qemu_strto[ui]" "*.c" "**/*.c" | grep -v tests/ |wc -l' outputs 107,
      although a few of those are the implementation in in cutils.c), so
      touching as little as possible is the wisest course of action.
      
      Signed-off-by: default avatarEric Blake <eblake@redhat.com>
      Reviewed-by: default avatarHanna Czenczek <hreitz@redhat.com>
      Message-Id: <20230522190441.64278-17-eblake@redhat.com>
      3c5f2467
    • Eric Blake's avatar
      cutils: Set value in all qemu_strtosz* error paths · 896fbd90
      Eric Blake authored
      
      Making callers determine whether or not *value was populated on error
      is not nice for usability.  Pre-patch, we have unit tests that check
      that *result is left unchanged on most EINVAL errors and set to 0 on
      many ERANGE errors.  This is subtly different from libc strtoumax()
      behavior which returns UINT64_MAX on ERANGE errors, as well as
      different from our parse_uint() which slams to 0 on EINVAL on the
      grounds that we want our functions to be harder to mis-use than
      strtoumax().
      
      Let's audit callers:
      
      - hw/core/numa.c:parse_numa() fixed in the previous patch to check for
        errors
      
      - migration/migration-hmp-cmds.c:hmp_migrate_set_parameter(),
        monitor/hmp.c:monitor_parse_arguments(),
        qapi/opts-visitor.c:opts_type_size(),
        qapi/qobject-input-visitor.c:qobject_input_type_size_keyval(),
        qemu-img.c:cvtnum_full(), qemu-io-cmds.c:cvtnum(),
        target/i386/cpu.c:x86_cpu_parse_featurestr(), and
        util/qemu-option.c:parse_option_size() appear to reject all failures
        (although some with distinct messages for ERANGE as opposed to
        EINVAL), so it doesn't matter what is in the value parameter on
        error.
      
      - All remaining callers are in the testsuite, where we can tweak our
        expectations to match our new desired behavior.
      
      Advancing to the end of the string parsed on overflow (ERANGE), while
      still returning 0, makes sense (UINT64_MAX as a size is unlikely to be
      useful); likewise, our size parsing code is complex enough that it's
      easier to always return 0 when endptr is NULL but trailing garbage was
      found, rather than trying to return the value of the prefix actually
      parsed (no current caller cared about the value of the prefix).
      
      Signed-off-by: default avatarEric Blake <eblake@redhat.com>
      Reviewed-by: default avatarHanna Czenczek <hreitz@redhat.com>
      Message-Id: <20230522190441.64278-16-eblake@redhat.com>
      896fbd90
Loading