Skip to content
Snippets Groups Projects
  1. Jan 12, 2022
    • Stefan Hajnoczi's avatar
      aio-posix: split poll check from ready handler · 826cc324
      Stefan Hajnoczi authored
      
      Adaptive polling measures the execution time of the polling check plus
      handlers called when a polled event becomes ready. Handlers can take a
      significant amount of time, making it look like polling was running for
      a long time when in fact the event handler was running for a long time.
      
      For example, on Linux the io_submit(2) syscall invoked when a virtio-blk
      device's virtqueue becomes ready can take 10s of microseconds. This
      can exceed the default polling interval (32 microseconds) and cause
      adaptive polling to stop polling.
      
      By excluding the handler's execution time from the polling check we make
      the adaptive polling calculation more accurate. As a result, the event
      loop now stays in polling mode where previously it would have fallen
      back to file descriptor monitoring.
      
      The following data was collected with virtio-blk num-queues=2
      event_idx=off using an IOThread. Before:
      
      168k IOPS, IOThread syscalls:
      
        9837.115 ( 0.020 ms): IO iothread1/620155 io_submit(ctx_id: 140512552468480, nr: 16, iocbpp: 0x7fcb9f937db0)    = 16
        9837.158 ( 0.002 ms): IO iothread1/620155 write(fd: 103, buf: 0x556a2ef71b88, count: 8)                         = 8
        9837.161 ( 0.001 ms): IO iothread1/620155 write(fd: 104, buf: 0x556a2ef71b88, count: 8)                         = 8
        9837.163 ( 0.001 ms): IO iothread1/620155 ppoll(ufds: 0x7fcb90002800, nfds: 4, tsp: 0x7fcb9f1342d0, sigsetsize: 8) = 3
        9837.164 ( 0.001 ms): IO iothread1/620155 read(fd: 107, buf: 0x7fcb9f939cc0, count: 512)                        = 8
        9837.174 ( 0.001 ms): IO iothread1/620155 read(fd: 105, buf: 0x7fcb9f939cc0, count: 512)                        = 8
        9837.176 ( 0.001 ms): IO iothread1/620155 read(fd: 106, buf: 0x7fcb9f939cc0, count: 512)                        = 8
        9837.209 ( 0.035 ms): IO iothread1/620155 io_submit(ctx_id: 140512552468480, nr: 32, iocbpp: 0x7fca7d0cebe0)    = 32
      
      174k IOPS (+3.6%), IOThread syscalls:
      
        9809.566 ( 0.036 ms): IO iothread1/623061 io_submit(ctx_id: 140539805028352, nr: 32, iocbpp: 0x7fd0cdd62be0)    = 32
        9809.625 ( 0.001 ms): IO iothread1/623061 write(fd: 103, buf: 0x5647cfba5f58, count: 8)                         = 8
        9809.627 ( 0.002 ms): IO iothread1/623061 write(fd: 104, buf: 0x5647cfba5f58, count: 8)                         = 8
        9809.663 ( 0.036 ms): IO iothread1/623061 io_submit(ctx_id: 140539805028352, nr: 32, iocbpp: 0x7fd0d0388b50)    = 32
      
      Notice that ppoll(2) and eventfd read(2) syscalls are eliminated because
      the IOThread stays in polling mode instead of falling back to file
      descriptor monitoring.
      
      As usual, polling is not implemented on Windows so this patch ignores
      the new io_poll_read() callback in aio-win32.c.
      
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Message-id: 20211207132336.36627-2-stefanha@redhat.com
      
      [Fixed up aio_set_event_notifier() calls in
      tests/unit/test-fdmon-epoll.c added after this series was queued.
      --Stefan]
      
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      826cc324
  2. Dec 22, 2021
  3. Dec 15, 2021
  4. Nov 09, 2021
  5. Nov 06, 2021
  6. Nov 03, 2021
    • Lukas Straub's avatar
      colo: Don't dump colo cache if dump-guest-core=off · e5fdf920
      Lukas Straub authored
      
      One might set dump-guest-core=off to make coredumps smaller and
      still allow to debug many qemu bugs. Extend this option to the colo
      cache.
      
      Signed-off-by: default avatarLukas Straub <lukasstraub2@web.de>
      Reviewed-by: default avatarJuan Quintela <quintela@redhat.com>
      Signed-off-by: default avatarJuan Quintela <quintela@redhat.com>
      e5fdf920
    • Rao, Lei's avatar
      Changed the last-mode to none of first start COLO · 2b9f6bf3
      Rao, Lei authored
      
      When we first stated the COLO, the last-mode is as follows:
      { "execute": "query-colo-status" }
      {"return": {"last-mode": "primary", "mode": "primary", "reason": "none"}}
      
      The last-mode is unreasonable. After the patch, will be changed to the
      following:
      { "execute": "query-colo-status" }
      {"return": {"last-mode": "none", "mode": "primary", "reason": "none"}}
      
      Signed-off-by: default avatarLei Rao <lei.rao@intel.com>
      Reviewed-by: default avatarJuan Quintela <quintela@redhat.com>
      Signed-off-by: default avatarJuan Quintela <quintela@redhat.com>
      2b9f6bf3
    • Rao, Lei's avatar
      Removed the qemu_fclose() in colo_process_incoming_thread · 04dd8916
      Rao, Lei authored
      
      After the live migration, the related fd will be cleanup in
      migration_incoming_state_destroy(). So, the qemu_close()
      in colo_process_incoming_thread is not necessary.
      
      Signed-off-by: default avatarLei Rao <lei.rao@intel.com>
      Reviewed-by: default avatarJuan Quintela <quintela@redhat.com>
      Signed-off-by: default avatarJuan Quintela <quintela@redhat.com>
      04dd8916
    • Rao, Lei's avatar
      colo: fixed 'Segmentation fault' when the simplex mode PVM poweroff · ac183dac
      Rao, Lei authored
      
      The GDB statck is as follows:
      Program terminated with signal SIGSEGV, Segmentation fault.
      0  object_class_dynamic_cast (class=0x55c8f5d2bf50, typename=0x55c8f2f7379e "qio-channel") at qom/object.c:832
               if (type->class->interfaces &&
      [Current thread is 1 (Thread 0x7f756e97eb00 (LWP 1811577))]
      (gdb) bt
      0  object_class_dynamic_cast (class=0x55c8f5d2bf50, typename=0x55c8f2f7379e "qio-channel") at qom/object.c:832
      1  0x000055c8f2c3dd14 in object_dynamic_cast (obj=0x55c8f543ac00, typename=0x55c8f2f7379e "qio-channel") at qom/object.c:763
      2  0x000055c8f2c3ddce in object_dynamic_cast_assert (obj=0x55c8f543ac00, typename=0x55c8f2f7379e "qio-channel",
          file=0x55c8f2f73780 "migration/qemu-file-channel.c", line=117, func=0x55c8f2f73800 <__func__.18724> "channel_shutdown") at qom/object.c:786
      3  0x000055c8f2bbc6ac in channel_shutdown (opaque=0x55c8f543ac00, rd=true, wr=true, errp=0x0) at migration/qemu-file-channel.c:117
      4  0x000055c8f2bba56e in qemu_file_shutdown (f=0x7f7558070f50) at migration/qemu-file.c:67
      5  0x000055c8f2ba5373 in migrate_fd_cancel (s=0x55c8f4ccf3f0) at migration/migration.c:1699
      6  0x000055c8f2ba1992 in migration_shutdown () at migration/migration.c:187
      7  0x000055c8f29a5b77 in main (argc=69, argv=0x7fff3e9e8c08, envp=0x7fff3e9e8e38) at vl.c:4512
      
      The root cause is that we still want to shutdown the from_dst_file in
      migrate_fd_cancel() after qemu_close in colo_process_checkpoint().
      So, we should set the s->rp_state.from_dst_file = NULL after
      qemu_close().
      
      Signed-off-by: default avatarLei Rao <lei.rao@intel.com>
      Reviewed-by: default avatarJuan Quintela <quintela@redhat.com>
      Signed-off-by: default avatarJuan Quintela <quintela@redhat.com>
      ac183dac
    • Rao, Lei's avatar
      Fixed SVM hang when do failover before PVM crash · 684bfd18
      Rao, Lei authored
      
      This patch fixed as follows:
          Thread 1 (Thread 0x7f34ee738d80 (LWP 11212)):
          #0 __pthread_clockjoin_ex (threadid=139847152957184, thread_return=0x7f30b1febf30, clockid=<optimized out>, abstime=<optimized out>, block=<optimized out>) at pthread_join_common.c:145
          #1 0x0000563401998e36 in qemu_thread_join (thread=0x563402d66610) at util/qemu-thread-posix.c:587
          #2 0x00005634017a79fa in process_incoming_migration_co (opaque=0x0) at migration/migration.c:502
          #3 0x00005634019b59c9 in coroutine_trampoline (i0=63395504, i1=22068) at util/coroutine-ucontext.c:115
          #4 0x00007f34ef860660 in ?? () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91 from /lib/x86_64-linux-gnu/libc.so.6
          #5 0x00007f30b21ee730 in ?? ()
          #6 0x0000000000000000 in ?? ()
      
          Thread 13 (Thread 0x7f30b3dff700 (LWP 11747)):
          #0  __lll_lock_wait (futex=futex@entry=0x56340218ffa0 <qemu_global_mutex>, private=0) at lowlevellock.c:52
          #1  0x00007f34efa000a3 in _GI__pthread_mutex_lock (mutex=0x56340218ffa0 <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:80
          #2  0x0000563401997f99 in qemu_mutex_lock_impl (mutex=0x56340218ffa0 <qemu_global_mutex>, file=0x563401b7a80e "migration/colo.c", line=806) at util/qemu-thread-posix.c:78
          #3  0x0000563401407144 in qemu_mutex_lock_iothread_impl (file=0x563401b7a80e "migration/colo.c", line=806) at /home/workspace/colo-qemu/cpus.c:1899
          #4  0x00005634017ba8e8 in colo_process_incoming_thread (opaque=0x563402d664c0) at migration/colo.c:806
          #5  0x0000563401998b72 in qemu_thread_start (args=0x5634039f8370) at util/qemu-thread-posix.c:519
          #6  0x00007f34ef9fd609 in start_thread (arg=<optimized out>) at pthread_create.c:477
          #7  0x00007f34ef924293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      
          The QEMU main thread is holding the lock:
          (gdb) p qemu_global_mutex
          $1 = {lock = {_data = {lock = 2, __count = 0, __owner = 11212, __nusers = 9, __kind = 0, __spins = 0, __elision = 0, __list = {_prev = 0x0, __next = 0x0}},
           __size = "\002\000\000\000\000\000\000\000\314+\000\000\t", '\000' <repeats 26 times>, __align = 2}, file = 0x563401c07e4b "util/main-loop.c", line = 240,
          initialized = true}
      
      >From the call trace, we can see it is a deadlock bug. and the QEMU main thread holds the global mutex to wait until the COLO thread ends. and the colo thread
      wants to acquire the global mutex, which will cause a deadlock. So, we should release the qemu_global_mutex before waiting colo thread ends.
      
      Signed-off-by: default avatarLei Rao <lei.rao@intel.com>
      Reviewed-by: default avatarLi Zhijian <lizhijian@cn.fujitsu.com>
      Reviewed-by: default avatarJuan Quintela <quintela@redhat.com>
      Signed-off-by: default avatarJuan Quintela <quintela@redhat.com>
      684bfd18
    • Rao, Lei's avatar
      Fixed qemu crash when guest power off in COLO mode · aa505f8e
      Rao, Lei authored
      
      This patch fixes the following:
      qemu-system-x86_64: invalid runstate transition: 'shutdown' -> 'running'
      Aborted (core dumped)
      The gdb bt as following:
      0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
      1  0x00007faa3d613859 in __GI_abort () at abort.c:79
      2  0x000055c5a21268fd in runstate_set (new_state=RUN_STATE_RUNNING) at vl.c:723
      3  0x000055c5a1f8cae4 in vm_prepare_start () at /home/workspace/colo-qemu/cpus.c:2206
      4  0x000055c5a1f8cb1b in vm_start () at /home/workspace/colo-qemu/cpus.c:2213
      5  0x000055c5a2332bba in migration_iteration_finish (s=0x55c5a4658810) at migration/migration.c:3376
      6  0x000055c5a2332f3b in migration_thread (opaque=0x55c5a4658810) at migration/migration.c:3527
      7  0x000055c5a251d68a in qemu_thread_start (args=0x55c5a5491a70) at util/qemu-thread-posix.c:519
      8  0x00007faa3d7e9609 in start_thread (arg=<optimized out>) at pthread_create.c:477
      9  0x00007faa3d710293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      
      Signed-off-by: default avatarLei Rao <lei.rao@intel.com>
      Reviewed-by: default avatarJuan Quintela <quintela@redhat.com>
      Signed-off-by: default avatarJuan Quintela <quintela@redhat.com>
      aa505f8e
    • Rao, Lei's avatar
      ae4c2099
    • Juan Quintela's avatar
      migration: Zero migration compression counters · 02abee3d
      Juan Quintela authored
      
      Based on previous patch from yuxiating <yuxiating@huawei.com>
      
      Signed-off-by: default avatarJuan Quintela <quintela@redhat.com>
      02abee3d
    • yuxiating's avatar
      migration: initialise compression_counters for a new migration · fa0b31d5
      yuxiating authored
      
      If the compression migration fails or is canceled, the query for the value of
      compression_counters during the next compression migration is wrong.
      
      Signed-off-by: default avataryuxiating <yuxiating@huawei.com>
      Reviewed-by: default avatarJuan Quintela <quintela@redhat.com>
      Signed-off-by: default avatarJuan Quintela <quintela@redhat.com>
      fa0b31d5
    • Laurent Vivier's avatar
      migration: provide an error message to migration_cancel() · 458fecca
      Laurent Vivier authored
      
      This avoids to call migrate_get_current() in the caller function
      whereas migration_cancel() already needs the pointer to the current
      migration state.
      
      Signed-off-by: default avatarLaurent Vivier <lvivier@redhat.com>
      Reviewed-by: default avatarJuan Quintela <quintela@redhat.com>
      Signed-off-by: default avatarJuan Quintela <quintela@redhat.com>
      458fecca
  7. Nov 01, 2021
Loading