Skip to content
Snippets Groups Projects
  1. Apr 24, 2017
  2. Apr 21, 2017
    • Stefan Hajnoczi's avatar
      throttle: make throttle_config(throttle_get_config()) symmetric · d72915c6
      Stefan Hajnoczi authored
      
      Throttling has a weird property that throttle_get_config() does not
      always return the same throttling settings that were given with
      throttle_config().  In other words, the set and get functions aren't
      symmetric.
      
      If .max is 0 then the throttling code assigns a default value of .avg /
      10 in throttle_config().  This is an implementation detail of the
      throttling algorithm.  When throttle_get_config() is called the .max
      value returned should still be 0.
      
      Users are exposed to this quirk via "info block" or "query-block"
      monitor commands.  This has caused confusion because it looks like a bug
      when an unexpected value is reported.
      
      This patch hides the .max value adjustment in throttle_get_config() and
      updates test-throttle.c appropriately.
      
      Reported-by: default avatarNini Gu <ngu@redhat.com>
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Reviewed-by: default avatarAlberto Garcia <berto@igalia.com>
      Message-id: 20170301115026.22621-4-stefanha@redhat.com
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      d72915c6
  3. Apr 11, 2017
  4. Apr 03, 2017
    • Richard W.M. Jones's avatar
      main-loop: Acquire main_context lock around os_host_main_loop_wait. · ecbddbb1
      Richard W.M. Jones authored
      When running virt-rescue the serial console hangs from time to time.
      Virt-rescue runs an ordinary Linux kernel "appliance", but there is
      only a single idle process running inside, so the qemu main loop is
      largely idle.  With virt-rescue >= 1.37 you may be able to observe the
      hang by doing:
      
        $ virt-rescue -e ^] --scratch
        ><rescue> while true; do ls -l /usr/bin; done
      
      The hang in virt-rescue can be resolved by pressing a key on the
      serial console.
      
      Possibly with the same root cause, we also observed hangs during very
      early boot of regular Linux VMs with a serial console.  Those hangs
      are extremely rare, but you may be able to observe them by running
      this command on baremetal for a sufficiently long time:
      
        $ while libguestfs-test-tool -t 60 >& /tmp/log ; do echo -n . ; done
      
      (Check in /tmp/log that the failure was caused by a hang during early
      boot, and not some other reason)
      
      During investigation of this bug, Paolo Bonzini wrote:
      
      > glib is expecting QEMU to use g_main_context_acquire around accesses to
      > GMainContext.  However QEMU is not doing that, instead it is taking its
      > own mutex.  So we should add g_main_context_acquire and
      > g_main_context_release in the two implementations of
      > os_host_main_loop_wait; these should undo the effect of Frediano's
      > glib patch.
      
      This patch exactly implements Paolo's suggestion in that paragraph.
      
      This fixes the serial console hang in my testing, across 3 different
      physical machines (AMD, Intel Core i7 and Intel Xeon), over many hours
      of automated testing.  I wasn't able to reproduce the early boot hangs
      (but as noted above, these are extremely rare in any case).
      
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1435432
      
      
      Reported-by: default avatarRichard W.M. Jones <rjones@redhat.com>
      Tested-by: default avatarRichard W.M. Jones <rjones@redhat.com>
      Signed-off-by: default avatarRichard W.M. Jones <rjones@redhat.com>
      Message-Id: <20170331205133.23906-1-rjones@redhat.com>
      [Paolo: this is actually a glib bug: recent glib versions are also
      expecting g_main_context_acquire around g_poll---but that is not
      documented and probably not even intended].
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ecbddbb1
    • Markus Armbruster's avatar
      sockets: New helper socket_address_crumple() · 216411b8
      Markus Armbruster authored
      
      SocketAddress is a simple union, and simple unions are awkward: they
      have their variant members wrapped in a "data" object on the wire, and
      require additional indirections in C.  I intend to limit its use to
      existing external interfaces.  New ones should use SocketAddressFlat.
      I further intend to convert all internal interfaces to
      SocketAddressFlat.  This helper should go away then.
      
      Signed-off-by: default avatarMarkus Armbruster <armbru@redhat.com>
      Message-id: 1490895797-29094-8-git-send-email-armbru@redhat.com
      Reviewed-by: default avatarEric Blake <eblake@redhat.com>
      Signed-off-by: default avatarMax Reitz <mreitz@redhat.com>
      216411b8
    • Markus Armbruster's avatar
      io vnc sockets: Clean up SocketAddressKind switches · a6c76285
      Markus Armbruster authored
      
      We have quite a few switches over SocketAddressKind.  Some have case
      labels for all enumeration values, others rely on a default label.
      Some abort when the value isn't a valid SocketAddressKind, others
      report an error then.
      
      Unify as follows.  Always provide case labels for all enumeration
      values, to clarify intent.  Abort when the value isn't a valid
      SocketAddressKind, because the program state is messed up then.
      
      Improve a few error messages while there.
      
      Signed-off-by: default avatarMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: default avatarMax Reitz <mreitz@redhat.com>
      Message-id: 1490895797-29094-4-git-send-email-armbru@redhat.com
      Signed-off-by: default avatarMax Reitz <mreitz@redhat.com>
      a6c76285
    • Markus Armbruster's avatar
      nbd sockets vnc: Mark problematic address family tests TODO · ca0b64e5
      Markus Armbruster authored
      
      Certain features make sense only with certain address families.  For
      instance, passing file descriptors requires AF_UNIX.  Testing
      SocketAddress's saddr->type == SOCKET_ADDRESS_KIND_UNIX is obvious,
      but problematic: it can't recognize AF_UNIX when type ==
      SOCKET_ADDRESS_KIND_FD.
      
      Mark such tests of saddr->type TODO.  We may want to check the address
      family with getsockname() there.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gerd Hoffmann <kraxel@redhat.com>
      Cc: Daniel P. Berrange <berrange@redhat.com>
      Signed-off-by: default avatarMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: default avatarMax Reitz <mreitz@redhat.com>
      Reviewed-by: default avatarEric Blake <eblake@redhat.com>
      Message-id: 1490895797-29094-2-git-send-email-armbru@redhat.com
      Signed-off-by: default avatarMax Reitz <mreitz@redhat.com>
      ca0b64e5
  5. Mar 28, 2017
    • Halil Pasic's avatar
      event_notifier: prevent accidental use after close · aa262928
      Halil Pasic authored
      
      Let's set the handles to the underlying facilities to their extremal
      value so no accidental misuse can happen, and to make it obvious that the
      notifier is dysfunctional. E.g. if we just close an fd but do not touch
      the int holding the fd eventually a read/write could succeed again when
      the fd gets reused, and corrupt the file addressed by the fd.
      
      Signed-off-by: default avatarHalil Pasic <pasic@linux.vnet.ibm.com>
      Reviewed-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Reviewed-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      aa262928
    • Markus Armbruster's avatar
      sockets: Fix socket_address_to_string() hostname truncation · 44fdc764
      Markus Armbruster authored
      
      We first snprintf() to a fixed buffer, then g_strdup() the result
      *boggle*.
      
      Worse, the size of the fixed buffer INET6_ADDRSTRLEN + 5 + 4 is bogus:
      the 4 correctly accounts for '[', ']', ':' and '\0', but
      INET6_ADDRSTRLEN is not a suitable limit for inet->host, and 5 is not
      one for inet->port!  They are for host and port in *numeric* form
      (exploiting that INET6_ADDRSTRLEN > INET_ADDRSTRLEN), but inet->host
      can also be a hostname, and inet->port can be a service name, to be
      resolved with getaddrinfo().
      
      Fortunately, the only user so far is the "socket" network backend's
      net_socket_connected(), which uses it to initialize a NetSocketState's
      info_str[].  info_str[] has considerable more space: 256 instead of
      55.  So the bug's impact appears to be limited to truncated "info
      networks" with the "socket" network backend.
      
      The fix is obvious: use g_strdup_printf().
      
      Signed-off-by: default avatarMarkus Armbruster <armbru@redhat.com>
      Message-Id: <1490268208-23368-1-git-send-email-armbru@redhat.com>
      Reviewed-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      44fdc764
  6. Mar 27, 2017
    • Andrey Shedel's avatar
      win32: replace custom mutex and condition variable with native primitives · 12f8def0
      Andrey Shedel authored
      
      The multithreaded TCG implementation exposed deadlocks in the win32
      condition variables: as implemented, qemu_cond_broadcast waited on
      receivers, whereas the pthreads API it was intended to emulate does
      not. This was causing a deadlock because broadcast was called while
      holding the IO lock, as well as all possible waiters blocked on the
      same lock.
      
      This patch replaces all the custom synchronisation code for mutexes
      and condition variables with native Windows primitives (SRWlocks and
      condition variables) with the same semantics as their POSIX
      equivalents. To enable that, it requires a Windows Vista or newer host
      OS.
      
      Signed-off-by: default avatarAndrey Shedel <ashedel@microsoft.com>
      [AB: edited commit message]
      Signed-off-by: default avatarAndrew Baumann <Andrew.Baumann@microsoft.com>
      Message-Id: <20170324220141.10104-1-Andrew.Baumann@microsoft.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      12f8def0
  7. Mar 24, 2017
  8. Mar 21, 2017
  9. Mar 19, 2017
    • Paolo Bonzini's avatar
      qemu-ga: obey LISTEN_PID when using systemd socket activation · 53fabd4b
      Paolo Bonzini authored
      
      qemu-ga's socket activation support was not obeying the LISTEN_PID
      environment variable, which avoids that a process uses a socket-activation
      file descriptor meant for its parent.
      
      Mess can for example ensue if a process forks a children before consuming
      the socket-activation file descriptor and therefore setting O_CLOEXEC
      on it.
      
      Luckily, qemu-nbd also got socket activation code, and its copy does
      support LISTEN_PID.  Some extra fixups are needed to ensure that the
      code can be used for both, but that's what this patch does.  The
      main change is to replace get_listen_fds's "consume" argument with
      the FIRST_SOCKET_ACTIVATION_FD macro from the qemu-nbd code.
      
      Cc: "Richard W.M. Jones" <rjones@redhat.com>
      Cc: Stefan Hajnoczi <stefanha@redhat.com>
      Reviewed-by: default avatarDaniel P. Berrange <berrange@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      53fabd4b
  10. Mar 17, 2017
  11. Mar 15, 2017
  12. Mar 14, 2017
    • Paolo Bonzini's avatar
      icount: process QEMU_CLOCK_VIRTUAL timers in vCPU thread · 6b8f0187
      Paolo Bonzini authored
      
      icount has become much slower after tcg_cpu_exec has stopped
      using the BQL.  There is also a latent bug that is masked by
      the slowness.
      
      The slowness happens because every occurrence of a QEMU_CLOCK_VIRTUAL
      timer now has to wake up the I/O thread and wait for it.  The rendez-vous
      is mediated by the BQL QemuMutex:
      
      - handle_icount_deadline wakes up the I/O thread with BQL taken
      - the I/O thread wakes up and waits on the BQL
      - the VCPU thread releases the BQL a little later
      - the I/O thread raises an interrupt, which calls qemu_cpu_kick
      - the VCPU thread notices the interrupt, takes the BQL to
        process it and waits on it
      
      All this back and forth is extremely expensive, causing a 6 to 8-fold
      slowdown when icount is turned on.
      
      One may think that the issue is that the VCPU thread is too dependent
      on the BQL, but then the latent bug comes in.  I first tried removing
      the BQL completely from the x86 cpu_exec, only to see everything break.
      The only way to fix it (and make everything slow again) was to add a dummy
      BQL lock/unlock pair.
      
      This is because in -icount mode you really have to process the events
      before the CPU restarts executing the next instruction.  Therefore, this
      series moves the processing of QEMU_CLOCK_VIRTUAL timers straight in
      the vCPU thread when running in icount mode.
      
      The required changes include:
      
      - make the timer notification callback wake up TCG's single vCPU thread
        when run from another thread.  By using async_run_on_cpu, the callback
        can override all_cpu_threads_idle() when the CPU is halted.
      
      - move handle_icount_deadline after qemu_tcg_wait_io_event, so that
        the timer notification callback is invoked after the dummy work item
        wakes up the vCPU thread
      
      - make handle_icount_deadline run the timers instead of just waking the
        I/O thread.
      
      - stop processing the timers in the main loop
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6b8f0187
    • Paolo Bonzini's avatar
      cpus: define QEMUTimerListNotifyCB for QEMU system emulation · 3f53bc61
      Paolo Bonzini authored
      
      There is no change for now, because the callback just invokes
      qemu_notify_event.
      
      Reviewed-by: default avatarEdgar E. Iglesias <edgar.iglesias@xilinx.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3f53bc61
    • Paolo Bonzini's avatar
      qemu-timer: do not include sysemu/cpus.h from util/qemu-timer.h · d2528bdc
      Paolo Bonzini authored
      
      This dependency is the wrong way, and we will need util/qemu-timer.h from
      sysemu/cpus.h in the next patch.
      
      Reviewed-by: default avatarAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: default avatarEdgar E. Iglesias <edgar.iglesias@xilinx.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d2528bdc
    • Paolo Bonzini's avatar
      qemu-timer: fix off-by-one · 33bef0b9
      Paolo Bonzini authored
      
      If the first timer is exactly at the current value of the clock, the
      deadline is met and the timer should fire.  This fixes itself on the next
      iteration of the loop without icount; with icount, however, execution
      of instructions will stop exactly at the deadline and won't proceed.
      
      Reviewed-by: default avatarAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: default avatarEdgar E. Iglesias <edgar.iglesias@xilinx.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      33bef0b9
    • Suramya Shah's avatar
      util: Removed unneeded header from path.c · bd5d983f
      Suramya Shah authored
      
      Signed-off-by: default avatarSuramya Shah <shah.suramya@gmail.com>
      Reviewed-by: default avatarEric Blake <eblake@redhat.com>
      Message-Id: <20170310163948.7567-1-shah.suramya@gmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bd5d983f
    • Jitendra Kolhe's avatar
      mem-prealloc: reduce large guest start-up and migration time. · 1e356fc1
      Jitendra Kolhe authored
      
      Using "-mem-prealloc" option for a large guest leads to higher guest
      start-up and migration time. This is because with "-mem-prealloc" option
      qemu tries to map every guest page (create address translations), and
      make sure the pages are available during runtime. virsh/libvirt by
      default, seems to use "-mem-prealloc" option in case the guest is
      configured to use huge pages. The patch tries to map all guest pages
      simultaneously by spawning multiple threads. Currently limiting the
      change to QEMU library functions on POSIX compliant host only, as we are
      not sure if the problem exists on win32. Below are some stats with
      "-mem-prealloc" option for guest configured to use huge pages.
      
      ------------------------------------------------------------------------
      Idle Guest      | Start-up time | Migration time
      ------------------------------------------------------------------------
      Guest stats with 2M HugePage usage - single threaded (existing code)
      ------------------------------------------------------------------------
      64 Core - 4TB   | 54m11.796s    | 75m43.843s
      64 Core - 1TB   | 8m56.576s     | 14m29.049s
      64 Core - 256GB | 2m11.245s     | 3m26.598s
      ------------------------------------------------------------------------
      Guest stats with 2M HugePage usage - map guest pages using 8 threads
      ------------------------------------------------------------------------
      64 Core - 4TB   | 5m1.027s      | 34m10.565s
      64 Core - 1TB   | 1m10.366s     | 8m28.188s
      64 Core - 256GB | 0m19.040s     | 2m10.148s
      -----------------------------------------------------------------------
      Guest stats with 2M HugePage usage - map guest pages using 16 threads
      -----------------------------------------------------------------------
      64 Core - 4TB   | 1m58.970s     | 31m43.400s
      64 Core - 1TB   | 0m39.885s     | 7m55.289s
      64 Core - 256GB | 0m11.960s     | 2m0.135s
      -----------------------------------------------------------------------
      
      Changed in v2:
       - modify number of memset threads spawned to min(smp_cpus, 16).
       - removed 64GB memory restriction for spawning memset threads.
      
      Changed in v3:
       - limit number of threads spawned based on
         min(sysconf(_SC_NPROCESSORS_ONLN), 16, smp_cpus)
       - implement memset thread specific siglongjmp in SIGBUS signal_handler.
      
      Changed in v4
       - remove sigsetjmp/siglongjmp and SIGBUS unblock/block for main thread
         as main thread no longer touches any pages.
       - simplify code my returning memset_thread_failed status from
         touch_all_pages.
      
      Signed-off-by: default avatarJitendra Kolhe <jitendra.kolhe@hpe.com>
      Message-Id: <1487907103-32350-1-git-send-email-jitendra.kolhe@hpe.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1e356fc1
  13. Mar 07, 2017
    • Markus Armbruster's avatar
      keyval: Support lists · 0b2c1bee
      Markus Armbruster authored
      
      Additionally permit non-negative integers as key components.  A
      dictionary's keys must either be all integers or none.  If all keys
      are integers, convert the dictionary to a list.  The set of keys must
      be [0,N].
      
      Examples:
      
      * list.1=goner,list.0=null,list.1=eins,list.2=zwei
        is equivalent to JSON [ "null", "eins", "zwei" ]
      
      * a.b.c=1,a.b.0=2
        is inconsistent: a.b.c clashes with a.b.0
      
      * list.0=null,list.2=eins,list.2=zwei
        has a hole: list.1 is missing
      
      Similar design flaw as for objects: there is no way to denote an empty
      list.  While interpreting "key absent" as empty list seems natural
      (removing a list member from the input string works when there are
      multiple ones, so why not when there's just one), it doesn't work:
      "key absent" already means "optional list absent", which isn't the
      same as "empty list present".
      
      Update the keyval object visitor to use this a.0 syntax in error
      messages rather than the usual a[0].
      
      Signed-off-by: default avatarMarkus Armbruster <armbru@redhat.com>
      Message-Id: <1488317230-26248-25-git-send-email-armbru@redhat.com>
      [Off-by-one fix squashed in, as per Kevin's review]
      Reviewed-by: default avatarKevin Wolf <kwolf@redhat.com>
      0b2c1bee
    • Markus Armbruster's avatar
      keyval: Restrict key components to valid QAPI names · f7400483
      Markus Armbruster authored
      
      Until now, key components are separated by '.'.  This leaves little
      room for evolving the syntax, and is incompatible with the __RFQDN_
      prefix convention for downstream extensions.
      
      Since key components will be commonly used as QAPI member names by the
      QObject input visitor, we can just as well borrow the QAPI naming
      rules here: letters, digits, hyphen and period starting with a letter,
      with an optional __RFQDN_ prefix for downstream extensions.
      
      Signed-off-by: default avatarMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: default avatarKevin Wolf <kwolf@redhat.com>
      Message-Id: <1488317230-26248-20-git-send-email-armbru@redhat.com>
      f7400483
    • Markus Armbruster's avatar
      keyval: New keyval_parse() · d454dbe0
      Markus Armbruster authored
      
      keyval_parse() parses KEY=VALUE,... into a QDict.  Works like
      qemu_opts_parse(), except:
      
      * Returns a QDict instead of a QemuOpts (d'oh).
      
      * Supports nesting, unlike QemuOpts: a KEY is split into key
        fragments at '.' (dotted key convention; the block layer does
        something similar on top of QemuOpts).  The key fragments are QDict
        keys, and the last one's value is updated to VALUE.
      
      * Each key fragment may be up to 127 bytes long.  qemu_opts_parse()
        limits the entire key to 127 bytes.
      
      * Overlong key fragments are rejected.  qemu_opts_parse() silently
        truncates them.
      
      * Empty key fragments are rejected.  qemu_opts_parse() happily
        accepts empty keys.
      
      * It does not store the returned value.  qemu_opts_parse() stores it
        in the QemuOptsList.
      
      * It does not treat parameter "id" specially.  qemu_opts_parse()
        ignores all but the first "id", and fails when its value isn't
        id_wellformed(), or duplicate (a QemuOpts with the same ID is
        already stored).  It also screws up when a value contains ",id=".
      
      * Implied value is not supported.  qemu_opts_parse() desugars "foo" to
        "foo=on", and "nofoo" to "foo=off".
      
      * An implied key's value can't be empty, and can't contain ','.
      
      I intend to grow this into a saner replacement for QemuOpts.  It'll
      take time, though.
      
      Note: keyval_parse() provides no way to do lists, and its key syntax
      is incompatible with the __RFQDN_ prefix convention for downstream
      extensions, because it blindly splits at '.', even in __RFQDN_.  Both
      issues will be addressed later in the series.
      
      Signed-off-by: default avatarMarkus Armbruster <armbru@redhat.com>
      Message-Id: <1488317230-26248-4-git-send-email-armbru@redhat.com>
      d454dbe0
  14. Mar 03, 2017
  15. Feb 28, 2017
  16. Feb 23, 2017
Loading