Skip to content
Snippets Groups Projects
  1. Nov 01, 2021
    • Peter Xu's avatar
      migration: Add migrate_add_blocker_internal() · 60fd6801
      Peter Xu authored
      
      An internal version that removes -only-migratable implications.  It can be used
      for temporary migration blockers like dump-guest-memory.
      
      Reviewed-by: default avatarMarc-André Lureau <marcandre.lureau@redhat.com>
      Reviewed-by: default avatarJuan Quintela <quintela@redhat.com>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarJuan Quintela <quintela@redhat.com>
      60fd6801
    • Peter Xu's avatar
      migration: Make migration blocker work for snapshots too · 4c170330
      Peter Xu authored
      
      save_snapshot() checks migration blocker, which looks sane.  At the meantime we
      should also teach the blocker add helper to fail if during a snapshot, just
      like for migrations.
      
      Reviewed-by: default avatarMarc-André Lureau <marcandre.lureau@redhat.com>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarJuan Quintela <quintela@redhat.com>
      Signed-off-by: default avatarJuan Quintela <quintela@redhat.com>
      4c170330
    • Hyman Huang's avatar
      migration/dirtyrate: implement dirty-ring dirtyrate calculation · 0e21bf24
      Hyman Huang authored
      
      use dirty ring feature to implement dirtyrate calculation.
      
      introduce mode option in qmp calc_dirty_rate to specify what
      method should be used when calculating dirtyrate, either
      page-sampling or dirty-ring should be passed.
      
      introduce "dirty_ring:-r" option in hmp calc_dirty_rate to
      indicate dirty ring method should be used for calculation.
      
      Signed-off-by: default avatarHyman Huang(黄勇) <huangy81@chinatelecom.cn>
      Message-Id: <7db445109bd18125ce8ec86816d14f6ab5de6a7d.1624040308.git.huangy81@chinatelecom.cn>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarJuan Quintela <quintela@redhat.com>
      Signed-off-by: default avatarJuan Quintela <quintela@redhat.com>
      0e21bf24
    • Hyman Huang's avatar
      migration/dirtyrate: move init step of calculation to main thread · 9865d0f6
      Hyman Huang authored
      
      since main thread may "query dirty rate" at any time, it's better
      to move init step into main thead so that synchronization overhead
      between "main" and "get_dirtyrate" can be reduced.
      
      Signed-off-by: default avatarHyman Huang(黄勇) <huangy81@chinatelecom.cn>
      Message-Id: <109f8077518ed2f13068e3bfb10e625e964780f1.1624040308.git.huangy81@chinatelecom.cn>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarJuan Quintela <quintela@redhat.com>
      Signed-off-by: default avatarJuan Quintela <quintela@redhat.com>
      9865d0f6
    • Hyman Huang's avatar
      migration/dirtyrate: adjust order of registering thread · 15eb2d64
      Hyman Huang authored
      
      registering get_dirtyrate thread in advance so that both
      page-sampling and dirty-ring mode can be covered.
      
      Signed-off-by: default avatarHyman Huang(黄勇) <huangy81@chinatelecom.cn>
      Message-Id: <d7727581a8e86d4a42fc3eacf7f310419b9ebf7e.1624040308.git.huangy81@chinatelecom.cn>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarJuan Quintela <quintela@redhat.com>
      Signed-off-by: default avatarJuan Quintela <quintela@redhat.com>
      15eb2d64
    • Hyman Huang's avatar
      migration/dirtyrate: introduce struct and adjust DirtyRateStat · 71864ead
      Hyman Huang authored
      
      introduce "DirtyRateMeasureMode" to specify what method should be
      used to calculate dirty rate, introduce "DirtyRateVcpu" to store
      dirty rate for each vcpu.
      
      use union to store stat data of specific mode
      
      Signed-off-by: default avatarHyman Huang(黄勇) <huangy81@chinatelecom.cn>
      Message-Id: <661c98c40f40e163aa58334337af8f3ddf41316a.1624040308.git.huangy81@chinatelecom.cn>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarJuan Quintela <quintela@redhat.com>
      Signed-off-by: default avatarJuan Quintela <quintela@redhat.com>
      71864ead
    • Hyman Huang's avatar
      memory: make global_dirty_tracking a bitmask · 63b41db4
      Hyman Huang authored
      
      since dirty ring has been introduced, there are two methods
      to track dirty pages of vm. it seems that "logging" has
      a hint on the method, so rename the global_dirty_log to
      global_dirty_tracking would make description more accurate.
      
      dirty rate measurement may start or stop dirty tracking during
      calculation. this conflict with migration because stop dirty
      tracking make migration leave dirty pages out then that'll be
      a problem.
      
      make global_dirty_tracking a bitmask can let both migration and
      dirty rate measurement work fine. introduce GLOBAL_DIRTY_MIGRATION
      and GLOBAL_DIRTY_DIRTY_RATE to distinguish what current dirty
      tracking aims for, migration or dirty rate.
      
      Signed-off-by: default avatarHyman Huang(黄勇) <huangy81@chinatelecom.cn>
      Message-Id: <9c9388657cfa0301bd2c1cfa36e7cf6da4aeca19.1624040308.git.huangy81@chinatelecom.cn>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarJuan Quintela <quintela@redhat.com>
      Signed-off-by: default avatarJuan Quintela <quintela@redhat.com>
      63b41db4
    • Hyman Huang's avatar
      KVM: introduce dirty_pages and kvm_dirty_ring_enabled · 7786ae40
      Hyman Huang authored
      
      dirty_pages is used to calculate dirtyrate via dirty ring, when
      enabled, kvm-reaper will increase the dirty pages after gfns
      being dirtied.
      
      kvm_dirty_ring_enabled shows if kvm-reaper is working. dirtyrate
      thread could use it to check if measurement can base on dirty
      ring feature.
      
      Signed-off-by: default avatarHyman Huang(黄勇) <huangy81@chinatelecom.cn>
      Message-Id: <fee5fb2ab17ec2159405fc54a3cff8e02322f816.1624040308.git.huangy81@chinatelecom.cn>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarJuan Quintela <quintela@redhat.com>
      Signed-off-by: default avatarJuan Quintela <quintela@redhat.com>
      7786ae40
    • Li Zhijian's avatar
      migration/rdma: Fix out of order wrid · b390afd8
      Li Zhijian authored
      
      destination:
      ../qemu/build/qemu-system-x86_64 -enable-kvm -netdev tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device e1000,netdev=hn0,mac=50:52:54:00:11:22 -boot c -drive if=none,file=./Fedora-rdma-server-migration.qcow2,id=drive-virtio-disk0 -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -vga qxl -spice streaming-video=filter,port=5902,disable-ticketing -incoming rdma:192.168.22.23:8888
      qemu-system-x86_64: -spice streaming-video=filter,port=5902,disable-ticketing: warning: short-form boolean option 'disable-ticketing' deprecated
      Please use disable-ticketing=on instead
      QEMU 6.0.50 monitor - type 'help' for more information
      (qemu) trace-event qemu_rdma_block_for_wrid_miss on
      (qemu) dest_init RDMA Device opened: kernel name rxe_eth0 uverbs device name uverbs2, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs2, infiniband class device path /sys/class/infiniband/rxe_eth0, transport: (2) Ethernet
      qemu_rdma_block_for_wrid_miss A Wanted wrid CONTROL SEND (2000) but got CONTROL RECV (4000)
      
      source:
      ../qemu/build/qemu-system-x86_64 -enable-kvm -netdev tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device e1000,netdev=hn0,mac=50:52:54:00:11:22 -boot c -drive if=none,file=./Fedora-rdma-server.qcow2,id=drive-virtio-disk0 -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -vga qxl -spice streaming-video=filter,port=5901,disable-ticketing -S
      qemu-system-x86_64: -spice streaming-video=filter,port=5901,disable-ticketing: warning: short-form boolean option 'disable-ticketing' deprecated
      Please use disable-ticketing=on instead
      QEMU 6.0.50 monitor - type 'help' for more information
      (qemu)
      (qemu) trace-event qemu_rdma_block_for_wrid_miss on
      (qemu) migrate -d rdma:192.168.22.23:8888
      source_resolve_host RDMA Device opened: kernel name rxe_eth0 uverbs device name uverbs2, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs2, infiniband class device path /sys/class/infiniband/rxe_eth0, transport: (2) Ethernet
      (qemu) qemu_rdma_block_for_wrid_miss A Wanted wrid WRITE RDMA (1) but got CONTROL RECV (4000)
      
      NOTE: we use soft RoCE as the rdma device.
      [root@iaas-rpma images]# rdma link show rxe_eth0/1
      link rxe_eth0/1 state ACTIVE physical_state LINK_UP netdev eth0
      
      This migration could not be completed when out of order(OOO) CQ event occurs.
      The send queue and receive queue shared a same completion queue, and
      qemu_rdma_block_for_wrid() will drop the CQs it's not interested in. But
      the dropped CQs by qemu_rdma_block_for_wrid() could be later CQs it wants.
      So in this case, qemu_rdma_block_for_wrid() will block forever.
      
      OOO cases will occur in both source side and destination side. And a
      forever blocking happens on only SEND and RECV are out of order. OOO between
      'WRITE RDMA' and 'RECV' doesn't matter.
      
      below the OOO sequence:
             source                             destination
            rdma_write_one()                   qemu_rdma_registration_handle()
      1.    S1: post_recv X                    D1: post_recv Y
      2.    wait for recv CQ event X
      3.                                       D2: post_send X     ---------------+
      4.                                       wait for send CQ send event X (D2) |
      5.    recv CQ event X reaches (D2)                                          |
      6.  +-S2: post_send Y                                                       |
      7.  | wait for send CQ event Y                                              |
      8.  |                                    recv CQ event Y (S2) (drop it)     |
      9.  +-send CQ event Y reaches (S2)                                          |
      10.                                      send CQ event X reaches (D2)  -----+
      11.                                      wait recv CQ event Y (dropped by (8))
      
      Although a hardware IB works fine in my a hundred of runs, the IB specification
      doesn't guaratee the CQ order in such case.
      
      Here we introduce a independent send completion queue to distinguish
      ibv_post_send completion queue from the original mixed completion queue.
      It helps us to poll the specific CQE we are really interested in.
      
      Signed-off-by: default avatarLi Zhijian <lizhijian@cn.fujitsu.com>
      Reviewed-by: default avatarJuan Quintela <quintela@redhat.com>
      Signed-off-by: default avatarJuan Quintela <quintela@redhat.com>
      b390afd8
  2. Oct 30, 2021
Loading