Skip to content
Snippets Groups Projects
  1. May 09, 2022
    • Nicolas Saenz Julienne's avatar
      util/event-loop-base: Introduce options to set the thread pool size · 71ad4713
      Nicolas Saenz Julienne authored
      
      The thread pool regulates itself: when idle, it kills threads until
      empty, when in demand, it creates new threads until full. This behaviour
      doesn't play well with latency sensitive workloads where the price of
      creating a new thread is too high. For example, when paired with qemu's
      '-mlock', or using safety features like SafeStack, creating a new thread
      has been measured take multiple milliseconds.
      
      In order to mitigate this let's introduce a new 'EventLoopBase'
      property to set the thread pool size. The threads will be created during
      the pool's initialization or upon updating the property's value, remain
      available during its lifetime regardless of demand, and destroyed upon
      freeing it. A properly characterized workload will then be able to
      configure the pool to avoid any latency spikes.
      
      Signed-off-by: default avatarNicolas Saenz Julienne <nsaenzju@redhat.com>
      Reviewed-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Acked-by: default avatarMarkus Armbruster <armbru@redhat.com>
      Message-id: 20220425075723.20019-4-nsaenzju@redhat.com
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      71ad4713
    • Nicolas Saenz Julienne's avatar
      Introduce event-loop-base abstract class · 7d5983e3
      Nicolas Saenz Julienne authored
      
      Introduce the 'event-loop-base' abstract class, it'll hold the
      properties common to all event loops and provide the necessary hooks for
      their creation and maintenance. Then have iothread inherit from it.
      
      EventLoopBaseClass is defined as user creatable and provides a hook for
      its children to attach themselves to the user creatable class 'complete'
      function. It also provides an update_params() callback to propagate
      property changes onto its children.
      
      The new 'event-loop-base' class will live in the root directory. It is
      built on its own using the 'link_whole' option (there are no direct
      function dependencies between the class and its children, it all happens
      trough 'constructor' magic). And also imposes new compilation
      dependencies:
      
          qom <- event-loop-base <- blockdev (iothread.c)
      
      And in subsequent patches:
      
          qom <- event-loop-base <- qemuutil (util/main-loop.c)
      
      All this forced some amount of reordering in meson.build:
      
       - Moved qom build definition before qemuutil. Doing it the other way
         around (i.e. moving qemuutil after qom) isn't possible as a lot of
         core libraries that live in between the two depend on it.
      
       - Process the 'hw' subdir earlier, as it introduces files into the
         'qom' source set.
      
      No functional changes intended.
      
      Signed-off-by: default avatarNicolas Saenz Julienne <nsaenzju@redhat.com>
      Reviewed-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Acked-by: default avatarMarkus Armbruster <armbru@redhat.com>
      Message-id: 20220425075723.20019-2-nsaenzju@redhat.com
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      7d5983e3
  2. Oct 07, 2021
  3. Jul 21, 2021
  4. Jun 18, 2021
    • Paolo Bonzini's avatar
      async: the main AioContext is only "current" if under the BQL · 5f50be9b
      Paolo Bonzini authored
      
      If we want to wake up a coroutine from a worker thread, aio_co_wake()
      currently does not work.  In that scenario, aio_co_wake() calls
      aio_co_enter(), but there is no current AioContext and therefore
      qemu_get_current_aio_context() returns the main thread.  aio_co_wake()
      then attempts to call aio_context_acquire() instead of going through
      aio_co_schedule().
      
      The default case of qemu_get_current_aio_context() was added to cover
      synchronous I/O started from the vCPU thread, but the main and vCPU
      threads are quite different.  The main thread is an I/O thread itself,
      only running a more complicated event loop; the vCPU thread instead
      is essentially a worker thread that occasionally calls
      qemu_mutex_lock_iothread().  It is only in those critical sections
      that it acts as if it were the home thread of the main AioContext.
      
      Therefore, this patch detaches qemu_get_current_aio_context() from
      iothreads, which is a useless complication.  The AioContext pointer
      is stored directly in the thread-local variable, including for the
      main loop.  Worker threads (including vCPU threads) optionally behave
      as temporary home threads if they have taken the big QEMU lock,
      but if that is not the case they will always schedule coroutines
      on remote threads via aio_co_schedule().
      
      With this change, the stub qemu_mutex_iothread_locked() must be changed
      from true to false.  The previous value of true was needed because the
      main thread did not have an AioContext in the thread-local variable,
      but now it does have one.
      
      Reported-by: default avatarVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <20210609122234.544153-1-pbonzini@redhat.com>
      Reviewed-by: default avatarVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      Tested-by: default avatarVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      [eblake: tweak commit message per Vladimir's review]
      Signed-off-by: default avatarEric Blake <eblake@redhat.com>
      5f50be9b
  5. Feb 10, 2021
  6. Jan 28, 2021
  7. Sep 23, 2020
    • Stefan Hajnoczi's avatar
      qemu/atomic.h: rename atomic_ to qatomic_ · d73415a3
      Stefan Hajnoczi authored
      
      clang's C11 atomic_fetch_*() functions only take a C11 atomic type
      pointer argument. QEMU uses direct types (int, etc) and this causes a
      compiler error when a QEMU code calls these functions in a source file
      that also included <stdatomic.h> via a system header file:
      
        $ CC=clang CXX=clang++ ./configure ... && make
        ../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int *' invalid)
      
      Avoid using atomic_*() names in QEMU's atomic.h since that namespace is
      used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h
      and <stdatomic.h> can co-exist. I checked /usr/include on my machine and
      searched GitHub for existing "qatomic_" users but there seem to be none.
      
      This patch was generated using:
      
        $ git grep -h -o '\<atomic\(64\)\?_[a-z0-9_]\+' include/qemu/atomic.h | \
          sort -u >/tmp/changed_identifiers
        $ for identifier in $(</tmp/changed_identifiers); do
              sed -i "s%\<$identifier\>%q$identifier%g" \
                  $(git grep -I -l "\<$identifier\>")
          done
      
      I manually fixed line-wrap issues and misaligned rST tables.
      
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Reviewed-by: default avatarPhilippe Mathieu-Daudé <philmd@redhat.com>
      Acked-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <20200923105646.47864-1-stefanha@redhat.com>
      d73415a3
  8. Sep 09, 2020
  9. Jul 21, 2020
  10. Jul 10, 2020
    • Markus Armbruster's avatar
      error: Eliminate error_propagate() with Coccinelle, part 1 · 668f62ec
      Markus Armbruster authored
      
      When all we do with an Error we receive into a local variable is
      propagating to somewhere else, we can just as well receive it there
      right away.  Convert
      
          if (!foo(..., &err)) {
              ...
              error_propagate(errp, err);
              ...
              return ...
          }
      
      to
      
          if (!foo(..., errp)) {
              ...
              ...
              return ...
          }
      
      where nothing else needs @err.  Coccinelle script:
      
          @rule1 forall@
          identifier fun, err, errp, lbl;
          expression list args, args2;
          binary operator op;
          constant c1, c2;
          symbol false;
          @@
               if (
          (
          -        fun(args, &err, args2)
          +        fun(args, errp, args2)
          |
          -        !fun(args, &err, args2)
          +        !fun(args, errp, args2)
          |
          -        fun(args, &err, args2) op c1
          +        fun(args, errp, args2) op c1
          )
                  )
               {
                   ... when != err
                       when != lbl:
                       when strict
          -        error_propagate(errp, err);
                   ... when != err
          (
                   return;
          |
                   return c2;
          |
                   return false;
          )
               }
      
          @rule2 forall@
          identifier fun, err, errp, lbl;
          expression list args, args2;
          expression var;
          binary operator op;
          constant c1, c2;
          symbol false;
          @@
          -    var = fun(args, &err, args2);
          +    var = fun(args, errp, args2);
               ... when != err
               if (
          (
                   var
          |
                   !var
          |
                   var op c1
          )
                  )
               {
                   ... when != err
                       when != lbl:
                       when strict
          -        error_propagate(errp, err);
                   ... when != err
          (
                   return;
          |
                   return c2;
          |
                   return false;
          |
                   return var;
          )
               }
      
          @depends on rule1 || rule2@
          identifier err;
          @@
          -    Error *err = NULL;
               ... when != err
      
      Not exactly elegant, I'm afraid.
      
      The "when != lbl:" is necessary to avoid transforming
      
               if (fun(args, &err)) {
                   goto out
               }
               ...
           out:
               error_propagate(errp, err);
      
      even though other paths to label out still need the error_propagate().
      For an actual example, see sclp_realize().
      
      Without the "when strict", Coccinelle transforms vfio_msix_setup(),
      incorrectly.  I don't know what exactly "when strict" does, only that
      it helps here.
      
      The match of return is narrower than what I want, but I can't figure
      out how to express "return where the operand doesn't use @err".  For
      an example where it's too narrow, see vfio_intx_enable().
      
      Silently fails to convert hw/arm/armsse.c, because Coccinelle gets
      confused by ARMSSE being used both as typedef and function-like macro
      there.  Converted manually.
      
      Line breaks tidied up manually.  One nested declaration of @local_err
      deleted manually.  Preexisting unwanted blank line dropped in
      hw/riscv/sifive_e.c.
      
      Signed-off-by: default avatarMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: default avatarEric Blake <eblake@redhat.com>
      Message-Id: <20200707160613.848843-35-armbru@redhat.com>
      668f62ec
    • Markus Armbruster's avatar
      error: Avoid unnecessary error_propagate() after error_setg() · dcfe4805
      Markus Armbruster authored
      
      Replace
      
          error_setg(&err, ...);
          error_propagate(errp, err);
      
      by
      
          error_setg(errp, ...);
      
      Related pattern:
      
          if (...) {
              error_setg(&err, ...);
              goto out;
          }
          ...
       out:
          error_propagate(errp, err);
          return;
      
      When all paths to label out are that way, replace by
      
          if (...) {
              error_setg(errp, ...);
              return;
          }
      
      and delete the label along with the error_propagate().
      
      When we have at most one other path that actually needs to propagate,
      and maybe one at the end that where propagation is unnecessary, e.g.
      
          foo(..., &err);
          if (err) {
              goto out;
          }
          ...
          bar(..., &err);
       out:
          error_propagate(errp, err);
          return;
      
      move the error_propagate() to where it's needed, like
      
          if (...) {
              foo(..., &err);
              error_propagate(errp, err);
              return;
          }
          ...
          bar(..., errp);
          return;
      
      and transform the error_setg() as above.
      
      In some places, the transformation results in obviously unnecessary
      error_propagate().  The next few commits will eliminate them.
      
      Bonus: the elimination of gotos will make later patches in this series
      easier to review.
      
      Candidates for conversion tracked down with this Coccinelle script:
      
          @@
          identifier err, errp;
          expression list args;
          @@
          -    error_setg(&err, args);
          +    error_setg(errp, args);
               ... when != err
               error_propagate(errp, err);
      
      Signed-off-by: default avatarMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: default avatarEric Blake <eblake@redhat.com>
      Message-Id: <20200707160613.848843-34-armbru@redhat.com>
      dcfe4805
    • Markus Armbruster's avatar
      qapi: Use returned bool to check for failure, Coccinelle part · 62a35aaa
      Markus Armbruster authored
      
      The previous commit enables conversion of
      
          visit_foo(..., &err);
          if (err) {
              ...
          }
      
      to
      
          if (!visit_foo(..., errp)) {
              ...
          }
      
      for visitor functions that now return true / false on success / error.
      Coccinelle script:
      
          @@
          identifier fun =~ "check_list|input_type_enum|lv_start_struct|lv_type_bool|lv_type_int64|lv_type_str|lv_type_uint64|output_type_enum|parse_type_bool|parse_type_int64|parse_type_null|parse_type_number|parse_type_size|parse_type_str|parse_type_uint64|print_type_bool|print_type_int64|print_type_null|print_type_number|print_type_size|print_type_str|print_type_uint64|qapi_clone_start_alternate|qapi_clone_start_list|qapi_clone_start_struct|qapi_clone_type_bool|qapi_clone_type_int64|qapi_clone_type_null|qapi_clone_type_number|qapi_clone_type_str|qapi_clone_type_uint64|qapi_dealloc_start_list|qapi_dealloc_start_struct|qapi_dealloc_type_anything|qapi_dealloc_type_bool|qapi_dealloc_type_int64|qapi_dealloc_type_null|qapi_dealloc_type_number|qapi_dealloc_type_str|qapi_dealloc_type_uint64|qobject_input_check_list|qobject_input_check_struct|qobject_input_start_alternate|qobject_input_start_list|qobject_input_start_struct|qobject_input_type_any|qobject_input_type_bool|qobject_input_type_bool_keyval|qobject_input_type_int64|qobject_input_type_int64_keyval|qobject_input_type_null|qobject_input_type_number|qobject_input_type_number_keyval|qobject_input_type_size_keyval|qobject_input_type_str|qobject_input_type_str_keyval|qobject_input_type_uint64|qobject_input_type_uint64_keyval|qobject_output_start_list|qobject_output_start_struct|qobject_output_type_any|qobject_output_type_bool|qobject_output_type_int64|qobject_output_type_null|qobject_output_type_number|qobject_output_type_str|qobject_output_type_uint64|start_list|visit_check_list|visit_check_struct|visit_start_alternate|visit_start_list|visit_start_struct|visit_type_.*";
          expression list args;
          typedef Error;
          Error *err;
          @@
          -    fun(args, &err);
          -    if (err)
          +    if (!fun(args, &err))
               {
                   ...
               }
      
      A few line breaks tidied up manually.
      
      Signed-off-by: default avatarMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: default avatarEric Blake <eblake@redhat.com>
      Reviewed-by: default avatarVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      Message-Id: <20200707160613.848843-19-armbru@redhat.com>
      62a35aaa
  11. May 15, 2020
    • Markus Armbruster's avatar
      qom: Drop parameter @errp of object_property_add() & friends · d2623129
      Markus Armbruster authored
      
      The only way object_property_add() can fail is when a property with
      the same name already exists.  Since our property names are all
      hardcoded, failure is a programming error, and the appropriate way to
      handle it is passing &error_abort.
      
      Same for its variants, except for object_property_add_child(), which
      additionally fails when the child already has a parent.  Parentage is
      also under program control, so this is a programming error, too.
      
      We have a bit over 500 callers.  Almost half of them pass
      &error_abort, slightly fewer ignore errors, one test case handles
      errors, and the remaining few callers pass them to their own callers.
      
      The previous few commits demonstrated once again that ignoring
      programming errors is a bad idea.
      
      Of the few ones that pass on errors, several violate the Error API.
      The Error ** argument must be NULL, &error_abort, &error_fatal, or a
      pointer to a variable containing NULL.  Passing an argument of the
      latter kind twice without clearing it in between is wrong: if the
      first call sets an error, it no longer points to NULL for the second
      call.  ich9_pm_add_properties(), sparc32_ledma_realize(),
      sparc32_dma_realize(), xilinx_axidma_realize(), xilinx_enet_realize()
      are wrong that way.
      
      When the one appropriate choice of argument is &error_abort, letting
      users pick the argument is a bad idea.
      
      Drop parameter @errp and assert the preconditions instead.
      
      There's one exception to "duplicate property name is a programming
      error": the way object_property_add() implements the magic (and
      undocumented) "automatic arrayification".  Don't drop @errp there.
      Instead, rename object_property_add() to object_property_try_add(),
      and add the obvious wrapper object_property_add().
      
      Signed-off-by: default avatarMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: default avatarEric Blake <eblake@redhat.com>
      Reviewed-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <20200505152926.18877-15-armbru@redhat.com>
      [Two semantic rebase conflicts resolved]
      d2623129
  12. Mar 08, 2019
  13. Feb 12, 2019
    • Peter Xu's avatar
      iothread: fix iothread hang when stop too soon · 6c95363d
      Peter Xu authored
      
      Lukas reported an hard to reproduce QMP iothread hang on s390 that
      QEMU might hang at pthread_join() of the QMP monitor iothread before
      quitting:
      
        Thread 1
        #0  0x000003ffad10932c in pthread_join
        #1  0x0000000109e95750 in qemu_thread_join
            at /home/thuth/devel/qemu/util/qemu-thread-posix.c:570
        #2  0x0000000109c95a1c in iothread_stop
        #3  0x0000000109bb0874 in monitor_cleanup
        #4  0x0000000109b55042 in main
      
      While the iothread is still in the main loop:
      
        Thread 4
        #0  0x000003ffad0010e4 in ??
        #1  0x000003ffad553958 in g_main_context_iterate.isra.19
        #2  0x000003ffad553d90 in g_main_loop_run
        #3  0x0000000109c9585a in iothread_run
            at /home/thuth/devel/qemu/iothread.c:74
        #4  0x0000000109e94752 in qemu_thread_start
            at /home/thuth/devel/qemu/util/qemu-thread-posix.c:502
        #5  0x000003ffad10825a in start_thread
        #6  0x000003ffad00dcf2 in thread_start
      
      IMHO it's because there's a race between the main thread and iothread
      when stopping the thread in following sequence:
      
          main thread                       iothread
          ===========                       ==============
                                            aio_poll()
          iothread_get_g_main_context
            set iothread->worker_context
          iothread_stop
            schedule iothread_stop_bh
                                              execute iothread_stop_bh [1]
                                                set iothread->running=false
                                                (since main_loop==NULL so
                                                 skip to quit main loop.
                                                 Note: although main_loop is
                                                 NULL but worker_context is
                                                 not!)
                                            atomic_read(&iothread->worker_context) [2]
                                              create main_loop object
                                              g_main_loop_run() [3]
          pthread_join() [4]
      
      We can see that when execute iothread_stop_bh() at [1] it's possible
      that main_loop is still NULL because it's only created until the first
      check of the worker_context later at [2].  Then the iothread will hang
      in the main loop [3] and it'll starve the main thread too [4].
      
      Here the simple solution should be that we check again the "running"
      variable before check against worker_context.
      
      CC: Thomas Huth <thuth@redhat.com>
      CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
      CC: Stefan Hajnoczi <stefanha@redhat.com>
      CC: Lukáš Doktor <ldoktor@redhat.com>
      CC: Markus Armbruster <armbru@redhat.com>
      CC: Eric Blake <eblake@redhat.com>
      CC: Paolo Bonzini <pbonzini@redhat.com>
      Reported-by: default avatarLukáš Doktor <ldoktor@redhat.com>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Tested-by: default avatarThomas Huth <thuth@redhat.com>
      Message-id: 20190129051432.22023-1-peterx@redhat.com
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      6c95363d
  14. Oct 12, 2018
  15. Apr 10, 2018
    • Peter Xu's avatar
      iothread: workaround glib bug which hangs qmp-test · 15544349
      Peter Xu authored
      
      Free the AIO context earlier than the GMainContext (if we have) to
      workaround a glib2 bug that GSource context pointer is not cleared even
      if the context has already been destroyed (while it should).
      
      The patch itself only changed the order to destroy the objects, no
      functional change at all. Without this workaround, we can encounter
      qmp-test hang with oob (and possibly any other use case when iothread is
      used with GMainContexts):
      
        #0  0x00007f35ffe45334 in __lll_lock_wait () from /lib64/libpthread.so.0
        #1  0x00007f35ffe405d8 in _L_lock_854 () from /lib64/libpthread.so.0
        #2  0x00007f35ffe404a7 in pthread_mutex_lock () from /lib64/libpthread.so.0
        #3  0x00007f35fc5b9c9d in g_source_unref_internal (source=0x24f0600, context=0x7f35f0000960, have_lock=0) at gmain.c:1685
        #4  0x0000000000aa6672 in aio_context_unref (ctx=0x24f0600) at /root/qemu/util/async.c:497
        #5  0x000000000065851c in iothread_instance_finalize (obj=0x24f0380) at /root/qemu/iothread.c:129
        #6  0x0000000000962d79 in object_deinit (obj=0x24f0380, type=0x242e960) at /root/qemu/qom/object.c:462
        #7  0x0000000000962e0d in object_finalize (data=0x24f0380) at /root/qemu/qom/object.c:476
        #8  0x0000000000964146 in object_unref (obj=0x24f0380) at /root/qemu/qom/object.c:924
        #9  0x0000000000965880 in object_finalize_child_property (obj=0x24ec640, name=0x24efca0 "mon_iothread", opaque=0x24f0380) at /root/qemu/qom/object.c:1436
        #10 0x0000000000962c33 in object_property_del_child (obj=0x24ec640, child=0x24f0380, errp=0x0) at /root/qemu/qom/object.c:436
        #11 0x0000000000962d26 in object_unparent (obj=0x24f0380) at /root/qemu/qom/object.c:455
        #12 0x0000000000658f00 in iothread_destroy (iothread=0x24f0380) at /root/qemu/iothread.c:365
        #13 0x00000000004c67a8 in monitor_cleanup () at /root/qemu/monitor.c:4663
        #14 0x0000000000669e27 in main (argc=16, argv=0x7ffc8b1ae2f8, envp=0x7ffc8b1ae380) at /root/qemu/vl.c:4749
      
      The glib2 bug is fixed in commit 26056558b ("gmain: allow
      g_source_get_context() on destroyed sources", 2012-07-30), so the first
      good version is glib2 2.33.10. But we still support building with
      glib as old as 2.28, so we need the workaround.
      
      Let's make sure we destroy the GSources first before its owner context
      until we drop support for glib older than 2.33.10.
      
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20180409083956.1780-1-peterx@redhat.com>
      Reviewed-by: default avatarEric Blake <eblake@redhat.com>
      Reviewed-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: default avatarEric Blake <eblake@redhat.com>
      15544349
  16. Mar 26, 2018
  17. Mar 08, 2018
    • Stefan Hajnoczi's avatar
      vl: introduce vm_shutdown() · 4486e89c
      Stefan Hajnoczi authored
      
      Commit 00d09fdb ("vl: pause vcpus before
      stopping iothreads") and commit dce8921b
      ("iothread: Stop threads before main() quits") tried to work around the
      fact that emulation was still active during termination by stopping
      iothreads.  They suffer from race conditions:
      1. virtio_scsi_handle_cmd_vq() racing with iothread_stop_all() hits the
         virtio_scsi_ctx_check() assertion failure because the BDS AioContext
         has been modified by iothread_stop_all().
      2. Guest vq kick racing with main loop termination leaves a readable
         ioeventfd that is handled by the next aio_poll() when external
         clients are enabled again, resulting in unwanted emulation activity.
      
      This patch obsoletes those commits by fully disabling emulation activity
      when vcpus are stopped.
      
      Use the new vm_shutdown() function instead of pause_all_vcpus() so that
      vm change state handlers are invoked too.  Virtio devices will now stop
      their ioeventfds, preventing further emulation activity after vm_stop().
      
      Note that vm_stop(RUN_STATE_SHUTDOWN) cannot be used because it emits a
      QMP STOP event that may affect existing clients.
      
      It is no longer necessary to call replay_disable_events() directly since
      vm_shutdown() does so already.
      
      Drop iothread_stop_all() since it is no longer used.
      
      Cc: Fam Zheng <famz@redhat.com>
      Cc: Kevin Wolf <kwolf@redhat.com>
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Reviewed-by: default avatarFam Zheng <famz@redhat.com>
      Acked-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Message-id: 20180307144205.20619-5-stefanha@redhat.com
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      4486e89c
  18. Mar 02, 2018
    • Markus Armbruster's avatar
      qapi: Empty out qapi-schema.json · 112ed241
      Markus Armbruster authored
      
      The previous commit improved compile time by including less of the
      generated QAPI headers.  This is impossible for stuff defined directly
      in qapi-schema.json, because that ends up in headers that that pull in
      everything.
      
      Move everything but include directives from qapi-schema.json to new
      sub-module qapi/misc.json, then include just the "misc" shard where
      possible.
      
      It's possible everywhere, except:
      
      * monitor.c needs qmp-command.h to get qmp_init_marshal()
      
      * monitor.c, ui/vnc.c and the generated qapi-event-FOO.c need
        qapi-event.h to get enum QAPIEvent
      
      Perhaps we'll get rid of those some other day.
      
      Adding a type to qapi/migration.json now recompiles some 120 instead
      of 2300 out of 5100 objects.
      
      Signed-off-by: default avatarMarkus Armbruster <armbru@redhat.com>
      Message-Id: <20180211093607.27351-25-armbru@redhat.com>
      [eblake: rebase to master]
      Signed-off-by: default avatarEric Blake <eblake@redhat.com>
      112ed241
  19. Feb 09, 2018
  20. Dec 19, 2017
    • Stefan Hajnoczi's avatar
      iothread: fix iothread_stop() race condition · 2362a28e
      Stefan Hajnoczi authored
      
      There is a small chance that iothread_stop() hangs as follows:
      
        Thread 3 (Thread 0x7f63eba5f700 (LWP 16105)):
        #0  0x00007f64012c09b6 in ppoll () at /lib64/libc.so.6
        #1  0x000055959992eac9 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
        #2  0x000055959992eac9 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:322
        #3  0x0000559599930711 in aio_poll (ctx=0x55959bdb83c0, blocking=blocking@entry=true) at util/aio-posix.c:629
        #4  0x00005595996806fe in iothread_run (opaque=0x55959bd78400) at iothread.c:59
        #5  0x00007f640159f609 in start_thread () at /lib64/libpthread.so.0
        #6  0x00007f64012cce6f in clone () at /lib64/libc.so.6
      
        Thread 1 (Thread 0x7f640b45b280 (LWP 16103)):
        #0  0x00007f64015a0b6d in pthread_join () at /lib64/libpthread.so.0
        #1  0x00005595999332ef in qemu_thread_join (thread=<optimized out>) at util/qemu-thread-posix.c:547
        #2  0x00005595996808ae in iothread_stop (iothread=<optimized out>) at iothread.c:91
        #3  0x000055959968094d in iothread_stop_iter (object=<optimized out>, opaque=<optimized out>) at iothread.c:102
        #4  0x0000559599857d97 in do_object_child_foreach (obj=obj@entry=0x55959bdb8100, fn=fn@entry=0x559599680930 <iothread_stop_iter>, opaque=opaque@entry=0x0, recurse=recurse@entry=false) at qom/object.c:852
        #5  0x0000559599859477 in object_child_foreach (obj=obj@entry=0x55959bdb8100, fn=fn@entry=0x559599680930 <iothread_stop_iter>, opaque=opaque@entry=0x0) at qom/object.c:867
        #6  0x0000559599680a6e in iothread_stop_all () at iothread.c:341
        #7  0x000055959955b1d5 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4913
      
      The relevant code from iothread_run() is:
      
        while (!atomic_read(&iothread->stopping)) {
            aio_poll(iothread->ctx, true);
      
      and iothread_stop():
      
        iothread->stopping = true;
        aio_notify(iothread->ctx);
        ...
        qemu_thread_join(&iothread->thread);
      
      The following scenario can occur:
      
      1. IOThread:
        while (!atomic_read(&iothread->stopping)) -> stopping=false
      
      2. Main loop:
        iothread->stopping = true;
        aio_notify(iothread->ctx);
      
      3. IOThread:
        aio_poll(iothread->ctx, true); -> hang
      
      The bug is explained by the AioContext->notify_me doc comments:
      
        "If this field is 0, everything (file descriptors, bottom halves,
        timers) will be re-evaluated before the next blocking poll(), thus the
        event_notifier_set call can be skipped."
      
      The problem is that "everything" does not include checking
      iothread->stopping.  This means iothread_run() will block in aio_poll()
      if aio_notify() was called just before aio_poll().
      
      This patch fixes the hang by replacing aio_notify() with
      aio_bh_schedule_oneshot().  This makes aio_poll() or g_main_loop_run()
      to return.
      
      Implementing this properly required a new bool running flag.  The new
      flag prevents races that are tricky if we try to use iothread->stopping.
      Now iothread->stopping is purely for iothread_stop() and
      iothread->running is purely for the iothread_run() thread.
      
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Reviewed-by: default avatarEric Blake <eblake@redhat.com>
      Message-id: 20171207201320.19284-6-stefanha@redhat.com
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      2362a28e
    • Stefan Hajnoczi's avatar
      iothread: add iothread_by_id() API · fbcc6923
      Stefan Hajnoczi authored
      
      Encapsulate IOThread QOM object lookup so that callers don't need to
      know how and where IOThread objects live.
      
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Reviewed-by: default avatarKevin Wolf <kwolf@redhat.com>
      Reviewed-by: default avatarEric Blake <eblake@redhat.com>
      Message-id: 20171206144550.22295-8-stefanha@redhat.com
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      fbcc6923
  21. Oct 03, 2017
  22. Sep 29, 2017
  23. Sep 08, 2017
  24. Feb 21, 2017
  25. Feb 03, 2017
    • Stefan Hajnoczi's avatar
      iothread: enable AioContext polling by default · cdd7abfd
      Stefan Hajnoczi authored
      
      IOThread AioContexts are likely to consist only of event sources like
      virtqueue ioeventfds and LinuxAIO completion eventfds that are pollable
      from userspace (without system calls).
      
      We recently merged the AioContext polling feature but didn't enable it
      by default yet.  I have gone back over the performance data on the
      mailing list and picked a default polling value that gave good results.
      
      Let's enable AioContext polling by default so users don't have another
      switch they need to set manually.  If performance regressions are found
      we can still disable this for the QEMU 2.9 release.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Karl Rister <krister@redhat.com>
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Message-id: 20170126170119.27876-1-stefanha@redhat.com
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      cdd7abfd
  26. Jan 03, 2017
Loading