Skip to content
Snippets Groups Projects
  1. Jul 22, 2022
    • Peter Maydell's avatar
      accel/kvm: Avoid Coverity warning in query_stats() · d12dd9c7
      Peter Maydell authored
      
      Coverity complains that there is a codepath in the query_stats()
      function where it can leak the memory pointed to by stats_list.  This
      can only happen if the caller passes something other than
      STATS_TARGET_VM or STATS_TARGET_VCPU as the 'target', which no
      callsite does.  Enforce this assumption using g_assert_not_reached(),
      so that if we have a future bug we hit the assert rather than
      silently leaking memory.
      
      Resolves: Coverity CID 1490140
      Fixes: cc01a3f4 ("kvm: Support for querying fd-based stats")
      Signed-off-by: default avatarPeter Maydell <peter.maydell@linaro.org>
      Message-Id: <20220719134853.327059-1-peter.maydell@linaro.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d12dd9c7
    • Bin Meng's avatar
      docs: Add caveats for Windows as the build platform · b67de91e
      Bin Meng authored
      Commit cf60ccc3 ("cutils: Introduce bundle mechanism") introduced
      a Python script to populate a bundle directory using os.symlink() to
      point to the binaries in the pc-bios directory of the source tree.
      Commit 882084a0 ("datadir: Use bundle mechanism") removed previous
      logic in pc-bios/meson.build to create a link/copy of pc-bios binaries
      in the build tree so os.symlink() is the way to go.
      
      However os.symlink() may fail [1] on Windows if an unprivileged Windows
      user started the QEMU build process, which results in QEMU executables
      generated in the build tree not able to load the default BIOS/firmware
      images due to symbolic links not present in the bundle directory.
      
      This commits updates the documentation by adding such caveats for users
      who want to build QEMU on the Windows platform.
      
      [1] https://docs.python.org/3/library/os.html#os.symlink
      
      
      
      Signed-off-by: default avatarBin Meng <bin.meng@windriver.com>
      Reviewed-by: default avatarStefan Weil <sw@weilnetz.de>
      Reviewed-by: default avatarAkihiko Odaki <akihiko.odaki@gmail.com>
      Message-Id: <20220719135014.764981-1-bmeng.cn@gmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b67de91e
  2. Jul 21, 2022
    • Peter Maydell's avatar
      Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging · 5288bee4
      Peter Maydell authored
      * Boolean statistics for KVM
      * Fix build on Haiku
      
      # gpg: Signature made Tue 19 Jul 2022 10:32:34 BST
      # gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
      # gpg:                issuer "pbonzini@redhat.com"
      # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
      # gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
      # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
      #      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83
      
      * tag 'for-upstream' of https://gitlab.com/bonzini/qemu
      
      :
        util: Fix broken build on Haiku
        kvm: add support for boolean statistics
        monitor: add support for boolean statistics
      
      Signed-off-by: default avatarPeter Maydell <peter.maydell@linaro.org>
      5288bee4
  3. Jul 20, 2022
    • Peter Maydell's avatar
      Merge tag 'pull-migration-20220720c' of https://gitlab.com/dagrh/qemu into staging · fe16c833
      Peter Maydell authored
      
      Migration pull 2022-07-20
      
      This replaces yesterdays pull and:
        a) Fixes some test build errors without TLS
        b) Reenabled the zlib acceleration on s390
           now that we have Ilya's fix
      
        Hyman's dirty page rate limit set
        Ilya's fix for zlib vs migration
        Peter's postcopy-preempt
        Cleanup from Dan
        zero-copy tidy ups from Leo
        multifd doc fix from Juan
        Revert disable of zlib acceleration on s390x
      
      Signed-off-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      
      # gpg: Signature made Wed 20 Jul 2022 12:18:56 BST
      # gpg:                using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7
      # gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full]
      # Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A  9FA9 0516 331E BC5B FDE7
      
      * tag 'pull-migration-20220720c' of https://gitlab.com/dagrh/qemu
      
      : (30 commits)
        Revert "gitlab: disable accelerated zlib for s390x"
        migration: Avoid false-positive on non-supported scenarios for zero-copy-send
        multifd: Document the locking of MultiFD{Send/Recv}Params
        migration/multifd: Report to user when zerocopy not working
        Add dirty-sync-missed-zero-copy migration stat
        QIOChannelSocket: Fix zero-copy flush returning code 1 when nothing sent
        migration: remove unreachable code after reading data
        tests: Add postcopy preempt tests
        tests: Add postcopy tls recovery migration test
        tests: Add postcopy tls migration test
        tests: Move MigrateCommon upper
        migration: Respect postcopy request order in preemption mode
        migration: Enable TLS for preempt channel
        migration: Export tls-[creds|hostname|authz] params to cmdline too
        migration: Add helpers to detect TLS capability
        migration: Add property x-postcopy-preempt-break-huge
        migration: Create the postcopy preempt channel asynchronously
        migration: Postcopy recover with preempt enabled
        migration: Postcopy preemption enablement
        migration: Postcopy preemption preparation on channel creation
        ...
      
      Signed-off-by: default avatarPeter Maydell <peter.maydell@linaro.org>
      fe16c833
    • Peter Maydell's avatar
      Merge tag 'net-pull-request' of https://github.com/jasowang/qemu into staging · 8ec4bc3c
      Peter Maydell authored
      # gpg: Signature made Wed 20 Jul 2022 09:58:47 BST
      # gpg:                using RSA key EF04965B398D6211
      # gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>" [marginal]
      # gpg: WARNING: This key is not certified with sufficiently trusted signatures!
      # gpg:          It is not certain that the signature belongs to the owner.
      # Primary key fingerprint: 215D 46F4 8246 689E C77F  3562 EF04 965B 398D 6211
      
      * tag 'net-pull-request' of https://github.com/jasowang/qemu
      
      : (25 commits)
        net/colo.c: fix segmentation fault when packet is not parsed correctly
        net/colo.c: No need to track conn_list for filter-rewriter
        net/colo: Fix a "double free" crash to clear the conn_list
        softmmu/runstate.c: add RunStateTransition support form COLO to PRELAUNCH
        vdpa: Add x-svq to NetdevVhostVDPAOptions
        vdpa: Add device migration blocker
        vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
        vdpa: Buffer CVQ support on shadow virtqueue
        vdpa: manual forward CVQ buffers
        vhost-net-vdpa: add stubs for when no virtio-net device is present
        vdpa: Export vhost_vdpa_dma_map and unmap calls
        vhost: Add svq avail_handler callback
        vhost: add vhost_svq_poll
        vhost: Expose vhost_svq_add
        vhost: add vhost_svq_push_elem
        vhost: Track number of descs in SVQDescState
        vhost: Add SVQDescState
        vhost: Decouple vhost_svq_add from VirtQueueElement
        vhost: Check for queue full at vhost_svq_add
        vhost: Move vhost_svq_kick call to vhost_svq_add
        ...
      
      Signed-off-by: default avatarPeter Maydell <peter.maydell@linaro.org>
      8ec4bc3c
    • Peter Maydell's avatar
      Merge tag 'pull-request-2022-07-20' of https://gitlab.com/thuth/qemu into staging · f45fd24c
      Peter Maydell authored
      * Fixes for s390x floating point vector instructions
      
      # gpg: Signature made Wed 20 Jul 2022 08:14:50 BST
      # gpg:                using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5
      # gpg:                issuer "thuth@redhat.com"
      # gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full]
      # gpg:                 aka "Thomas Huth <thuth@redhat.com>" [full]
      # gpg:                 aka "Thomas Huth <huth@tuxfamily.org>" [full]
      # gpg:                 aka "Thomas Huth <th.huth@posteo.de>" [unknown]
      # Primary key fingerprint: 27B8 8847 EEE0 2501 18F3  EAB9 2ED9 D774 FE70 2DB5
      
      * tag 'pull-request-2022-07-20' of https://gitlab.com/thuth/qemu
      
      :
        tests/tcg/s390x: test signed vfmin/vfmax
        target/s390x: fix NaN propagation rules
        target/s390x: fix handling of zeroes in vfmin/vfmax
      
      Signed-off-by: default avatarPeter Maydell <peter.maydell@linaro.org>
      
      # Conflicts:
      #	fpu/softfloat-specialize.c.inc
      f45fd24c
    • Dr. David Alan Gilbert's avatar
      Revert "gitlab: disable accelerated zlib for s390x" · db727a14
      Dr. David Alan Gilbert authored
      
      This reverts commit 309df6ac.
      With Ilya's 'multifd: Copy pages before compressing them with zlib'
      in the latest migration series, this shouldn't be a problem any more.
      
      Suggested-by: default avatarPeter Maydell <peter.maydell@linaro.org>
      Signed-off-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      Reviewed-by: default avatarThomas Huth <thuth@redhat.com>
      db727a14
    • Leonardo Bras's avatar
      migration: Avoid false-positive on non-supported scenarios for zero-copy-send · 90eb69e4
      Leonardo Bras authored
      
      Migration with zero-copy-send currently has it's limitations, as it can't
      be used with TLS nor any kind of compression. In such scenarios, it should
      output errors during parameter / capability setting.
      
      But currently there are some ways of setting this not-supported scenarios
      without printing the error message:
      
      !) For 'compression' capability, it works by enabling it together with
      zero-copy-send. This happens because the validity test for zero-copy uses
      the helper unction migrate_use_compression(), which check for compression
      presence in s->enabled_capabilities[MIGRATION_CAPABILITY_COMPRESS].
      
      The point here is: the validity test happens before the capability gets
      enabled. If all of them get enabled together, this test will not return
      error.
      
      In order to fix that, replace migrate_use_compression() by directly testing
      the cap_list parameter migrate_caps_check().
      
      2) For features enabled by parameters such as TLS & 'multifd_compression',
      there was also a possibility of setting non-supported scenarios: setting
      zero-copy-send first, then setting the unsupported parameter.
      
      In order to fix that, also add a check for parameters conflicting with
      zero-copy-send on migrate_params_check().
      
      3) XBZRLE is also a compression capability, so it makes sense to also add
      it to the list of capabilities which are not supported with zero-copy-send.
      
      Fixes: 1abaec9a ("migration: Change zero_copy_send from migration parameter to migration capability")
      Signed-off-by: default avatarLeonardo Bras <leobras@redhat.com>
      Message-Id: <20220719122345.253713-1-leobras@redhat.com>
      Reviewed-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      Signed-off-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      90eb69e4
    • Juan Quintela's avatar
      multifd: Document the locking of MultiFD{Send/Recv}Params · 4a8f19c9
      Juan Quintela authored
      
      Reorder the structures so we can know if the fields are:
      - Read only
      - Their own locking (i.e. sems)
      - Protected by 'mutex'
      - Only for the multifd channel
      
      Signed-off-by: default avatarJuan Quintela <quintela@redhat.com>
      Message-Id: <20220531104318.7494-2-quintela@redhat.com>
      Reviewed-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      Signed-off-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
        dgilbert: Typo fixes from Chen Zhang
      4a8f19c9
    • Leonardo Bras's avatar
      migration/multifd: Report to user when zerocopy not working · d59c40cc
      Leonardo Bras authored
      
      Some errors, like the lack of Scatter-Gather support by the network
      interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using
      zero-copy, which causes it to fall back to the default copying mechanism.
      
      After each full dirty-bitmap scan there should be a zero-copy flush
      happening, which checks for errors each of the previous calls to
      sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then
      increment dirty_sync_missed_zero_copy migration stat to let the user know
      about it.
      
      Signed-off-by: default avatarLeonardo Bras <leobras@redhat.com>
      Reviewed-by: default avatarDaniel P. Berrangé <berrange@redhat.com>
      Acked-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20220711211112.18951-4-leobras@redhat.com>
      Signed-off-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      d59c40cc
    • Leonardo Bras's avatar
      Add dirty-sync-missed-zero-copy migration stat · cf20c897
      Leonardo Bras authored
      
      Signed-off-by: default avatarLeonardo Bras <leobras@redhat.com>
      Acked-by: default avatarMarkus Armbruster <armbru@redhat.com>
      Acked-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarDaniel P. Berrangé <berrange@redhat.com>
      Message-Id: <20220711211112.18951-3-leobras@redhat.com>
      Signed-off-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      cf20c897
    • Leonardo Bras's avatar
      QIOChannelSocket: Fix zero-copy flush returning code 1 when nothing sent · 927f93e0
      Leonardo Bras authored
      
      If flush is called when no buffer was sent with MSG_ZEROCOPY, it currently
      returns 1. This return code should be used only when Linux fails to use
      MSG_ZEROCOPY on a lot of sendmsg().
      
      Fix this by returning early from flush if no sendmsg(...,MSG_ZEROCOPY)
      was attempted.
      
      Fixes: 2bc58ffc ("QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX")
      Signed-off-by: default avatarLeonardo Bras <leobras@redhat.com>
      Reviewed-by: default avatarDaniel P. Berrangé <berrange@redhat.com>
      Acked-by: default avatarDaniel P. Berrangé <berrange@redhat.com>
      Reviewed-by: default avatarJuan Quintela <quintela@redhat.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20220711211112.18951-2-leobras@redhat.com>
      Signed-off-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      927f93e0
    • Daniel P. Berrangé's avatar
      migration: remove unreachable code after reading data · 5f87072e
      Daniel P. Berrangé authored
      
      The code calls qio_channel_read() in a loop when it reports
      QIO_CHANNEL_ERR_BLOCK. This code is reported when errno==EAGAIN.
      
      As such the later block of code will always hit the 'errno != EAGAIN'
      condition, making the final 'else' unreachable.
      
      Fixes: Coverity CID 1490203
      Signed-off-by: default avatarDaniel P. Berrangé <berrange@redhat.com>
      Message-Id: <20220627135318.156121-1-berrange@redhat.com>
      Reviewed-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      Signed-off-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      5f87072e
    • Peter Xu's avatar
      tests: Add postcopy preempt tests · 8f6fe915
      Peter Xu authored
      
      Four tests are added for preempt mode:
      
        - Postcopy plain
        - Postcopy recovery
        - Postcopy tls
        - Postcopy tls+recovery
      
      Reviewed-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20220707185530.27801-1-peterx@redhat.com>
      Signed-off-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
        dgilbert: Manual merge
      8f6fe915
    • Peter Xu's avatar
      tests: Add postcopy tls recovery migration test · 767fa9cf
      Peter Xu authored
      
      It's easy to build this upon the postcopy tls test.  Rename the old
      postcopy recovery test to postcopy/recovery/plain.
      
      Reviewed-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20220707185527.27747-1-peterx@redhat.com>
      Signed-off-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
        dgilbert: Manual merge
      767fa9cf
    • Peter Xu's avatar
      tests: Add postcopy tls migration test · d1a27b16
      Peter Xu authored
      
      We just added TLS tests for precopy but not postcopy.  Add the
      corresponding test for vanilla postcopy.
      
      Rename the vanilla postcopy to "postcopy/plain" because all postcopy tests
      will only use unix sockets as channel.
      
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20220707185525.27692-1-peterx@redhat.com>
      Reviewed-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      Signed-off-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
        dgilbert: Manual merge
      d1a27b16
    • Peter Xu's avatar
      tests: Move MigrateCommon upper · 312e9dd0
      Peter Xu authored
      
      So that it can be used in postcopy tests too soon.
      
      Reviewed-by: default avatarDaniel P. Berrange <berrange@redhat.com>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20220707185522.27638-1-peterx@redhat.com>
      Signed-off-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      312e9dd0
    • Peter Xu's avatar
      migration: Respect postcopy request order in preemption mode · 82b54ef4
      Peter Xu authored
      
      With preemption mode on, when we see a postcopy request that was requesting
      for exactly the page that we have preempted before (so we've partially sent
      the page already via PRECOPY channel and it got preempted by another
      postcopy request), currently we drop the request so that after all the
      other postcopy requests are serviced then we'll go back to precopy stream
      and start to handle that.
      
      We dropped the request because we can't send it via postcopy channel since
      the precopy channel already contains partial of the data, and we can only
      send a huge page via one channel as a whole.  We can't split a huge page
      into two channels.
      
      That's a very corner case and that works, but there's a change on the order
      of postcopy requests that we handle since we're postponing this (unlucky)
      postcopy request to be later than the other queued postcopy requests.  The
      problem is there's a possibility that when the guest was very busy, the
      postcopy queue can be always non-empty, it means this dropped request will
      never be handled until the end of postcopy migration. So, there's a chance
      that there's one dest QEMU vcpu thread waiting for a page fault for an
      extremely long time just because it's unluckily accessing the specific page
      that was preempted before.
      
      The worst case time it needs can be as long as the whole postcopy migration
      procedure.  It's extremely unlikely to happen, but when it happens it's not
      good.
      
      The root cause of this problem is because we treat pss->postcopy_requested
      variable as with two meanings bound together, as the variable shows:
      
        1. Whether this page request is urgent, and,
        2. Which channel we should use for this page request.
      
      With the old code, when we set postcopy_requested it means either both (1)
      and (2) are true, or both (1) and (2) are false.  We can never have (1)
      and (2) to have different values.
      
      However it doesn't necessarily need to be like that.  It's very legal that
      there's one request that has (1) very high urgency, but (2) we'd like to
      use the precopy channel.  Just like the corner case we were discussing
      above.
      
      To differenciate the two meanings better, introduce a new field called
      postcopy_target_channel, showing which channel we should use for this page
      request, so as to cover the old meaning (2) only.  Then we leave the
      postcopy_requested variable to stand only for meaning (1), which is the
      urgency of this page request.
      
      With this change, we can easily boost priority of a preempted precopy page
      as long as we know that page is also requested as a postcopy page.  So with
      the new approach in get_queued_page() instead of dropping that request, we
      send it right away with the precopy channel so we get back the ordering of
      the page faults just like how they're requested on dest.
      
      Reported-by: default avatarManish Mishra <manish.mishra@nutanix.com>
      Reviewed-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      Reviewed-by: default avatarManish Mishra <manish.mishra@nutanix.com>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20220707185520.27583-1-peterx@redhat.com>
      Signed-off-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      82b54ef4
    • Peter Xu's avatar
      migration: Enable TLS for preempt channel · f0afaf6c
      Peter Xu authored
      
      This patch is based on the async preempt channel creation.  It continues
      wiring up the new channel with TLS handshake to destionation when enabled.
      
      Note that only the src QEMU needs such operation; the dest QEMU does not
      need any change for TLS support due to the fact that all channels are
      established synchronously there, so all the TLS magic is already properly
      handled by migration_tls_channel_process_incoming().
      
      Reviewed-by: default avatarDaniel P. Berrange <berrange@redhat.com>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20220707185518.27529-1-peterx@redhat.com>
      Signed-off-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      f0afaf6c
    • Peter Xu's avatar
      migration: Export tls-[creds|hostname|authz] params to cmdline too · 9a266627
      Peter Xu authored
      
      It's useful for specifying tls credentials all in the cmdline (along with
      the -object tls-creds-*), especially for debugging purpose.
      
      The trick here is we must remember to not free these fields again in the
      finalize() function of migration object, otherwise it'll cause double-free.
      
      The thing is when destroying an object, we'll first destroy the properties
      that bound to the object, then the object itself.  To be explicit, when
      destroy the object in object_finalize() we have such sequence of
      operations:
      
          object_property_del_all(obj);
          object_deinit(obj, ti);
      
      So after this change the two fields are properly released already even
      before reaching the finalize() function but in object_property_del_all(),
      hence we don't need to free them anymore in finalize() or it's double-free.
      
      This also fixes a trivial memory leak for tls-authz as we forgot to free it
      before this patch.
      
      Reviewed-by: default avatarDaniel P. Berrange <berrange@redhat.com>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20220707185515.27475-1-peterx@redhat.com>
      Signed-off-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      9a266627
    • Peter Xu's avatar
      migration: Add helpers to detect TLS capability · 85a8578e
      Peter Xu authored
      
      Add migrate_channel_requires_tls() to detect whether the specific channel
      requires TLS, leveraging the recently introduced migrate_use_tls().  No
      functional change intended.
      
      Reviewed-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20220707185513.27421-1-peterx@redhat.com>
      Signed-off-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      85a8578e
    • Peter Xu's avatar
      migration: Add property x-postcopy-preempt-break-huge · c8750de1
      Peter Xu authored
      
      Add a property field that can conditionally disable the "break sending huge
      page" behavior in postcopy preemption.  By default it's enabled.
      
      It should only be used for debugging purposes, and we should never remove
      the "x-" prefix.
      
      Reviewed-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      Reviewed-by: default avatarManish Mishra <manish.mishra@nutanix.com>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20220707185511.27366-1-peterx@redhat.com>
      Signed-off-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      c8750de1
    • Peter Xu's avatar
      migration: Create the postcopy preempt channel asynchronously · d0edb8a1
      Peter Xu authored
      
      This patch allows the postcopy preempt channel to be created
      asynchronously.  The benefit is that when the connection is slow, we won't
      take the BQL (and potentially block all things like QMP) for a long time
      without releasing.
      
      A function postcopy_preempt_wait_channel() is introduced, allowing the
      migration thread to be able to wait on the channel creation.  The channel
      is always created by the main thread, in which we'll kick a new semaphore
      to tell the migration thread that the channel has created.
      
      We'll need to wait for the new channel in two places: (1) when there's a
      new postcopy migration that is starting, or (2) when there's a postcopy
      migration to resume.
      
      For the start of migration, we don't need to wait for this channel until
      when we want to start postcopy, aka, postcopy_start().  We'll fail the
      migration if we found that the channel creation failed (which should
      probably not happen at all in 99% of the cases, because the main channel is
      using the same network topology).
      
      For a postcopy recovery, we'll need to wait in postcopy_pause().  In that
      case if the channel creation failed, we can't fail the migration or we'll
      crash the VM, instead we keep in PAUSED state, waiting for yet another
      recovery.
      
      Reviewed-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      Reviewed-by: default avatarManish Mishra <manish.mishra@nutanix.com>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20220707185509.27311-1-peterx@redhat.com>
      Signed-off-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      d0edb8a1
    • Peter Xu's avatar
      migration: Postcopy recover with preempt enabled · 60bb3c58
      Peter Xu authored
      
      To allow postcopy recovery, the ram fast load (preempt-only) dest QEMU thread
      needs similar handling on fault tolerance.  When ram_load_postcopy() fails,
      instead of stopping the thread it halts with a semaphore, preparing to be
      kicked again when recovery is detected.
      
      A mutex is introduced to make sure there's no concurrent operation upon the
      socket.  To make it simple, the fast ram load thread will take the mutex during
      its whole procedure, and only release it if it's paused.  The fast-path socket
      will be properly released by the main loading thread safely when there's
      network failures during postcopy with that mutex held.
      
      Reviewed-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20220707185506.27257-1-peterx@redhat.com>
      Signed-off-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      60bb3c58
    • Peter Xu's avatar
      migration: Postcopy preemption enablement · c01b16ed
      Peter Xu authored
      
      This patch enables postcopy-preempt feature.
      
      It contains two major changes to the migration logic:
      
      (1) Postcopy requests are now sent via a different socket from precopy
          background migration stream, so as to be isolated from very high page
          request delays.
      
      (2) For huge page enabled hosts: when there's postcopy requests, they can now
          intercept a partial sending of huge host pages on src QEMU.
      
      After this patch, we'll live migrate a VM with two channels for postcopy: (1)
      PRECOPY channel, which is the default channel that transfers background pages;
      and (2) POSTCOPY channel, which only transfers requested pages.
      
      There's no strict rule of which channel to use, e.g., if a requested page is
      already being transferred on precopy channel, then we will keep using the same
      precopy channel to transfer the page even if it's explicitly requested.  In 99%
      of the cases we'll prioritize the channels so we send requested page via the
      postcopy channel as long as possible.
      
      On the source QEMU, when we found a postcopy request, we'll interrupt the
      PRECOPY channel sending process and quickly switch to the POSTCOPY channel.
      After we serviced all the high priority postcopy pages, we'll switch back to
      PRECOPY channel so that we'll continue to send the interrupted huge page again.
      There's no new thread introduced on src QEMU.
      
      On the destination QEMU, one new thread is introduced to receive page data from
      the postcopy specific socket (done in the preparation patch).
      
      This patch has a side effect: after sending postcopy pages, previously we'll
      assume the guest will access follow up pages so we'll keep sending from there.
      Now it's changed.  Instead of going on with a postcopy requested page, we'll go
      back and continue sending the precopy huge page (which can be intercepted by a
      postcopy request so the huge page can be sent partially before).
      
      Whether that's a problem is debatable, because "assuming the guest will
      continue to access the next page" may not really suite when huge pages are
      used, especially if the huge page is large (e.g. 1GB pages).  So that locality
      hint is much meaningless if huge pages are used.
      
      Reviewed-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20220707185504.27203-1-peterx@redhat.com>
      Signed-off-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      c01b16ed
    • Peter Xu's avatar
      migration: Postcopy preemption preparation on channel creation · 36f62f11
      Peter Xu authored
      
      Create a new socket for postcopy to be prepared to send postcopy requested
      pages via this specific channel, so as to not get blocked by precopy pages.
      
      A new thread is also created on dest qemu to receive data from this new channel
      based on the ram_load_postcopy() routine.
      
      The ram_load_postcopy(POSTCOPY) branch and the thread has not started to
      function, and that'll be done in follow up patches.
      
      Cleanup the new sockets on both src/dst QEMUs, meanwhile look after the new
      thread too to make sure it'll be recycled properly.
      
      Reviewed-by: default avatarDaniel P. Berrangé <berrange@redhat.com>
      Reviewed-by: default avatarJuan Quintela <quintela@redhat.com>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20220707185502.27149-1-peterx@redhat.com>
      Signed-off-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
        dgilbert: With Peter's fix to quieten compiler warning on
             start_migration
      36f62f11
    • Peter Xu's avatar
      migration: Add postcopy-preempt capability · ce5b0f4a
      Peter Xu authored
      
      Firstly, postcopy already preempts precopy due to the fact that we do
      unqueue_page() first before looking into dirty bits.
      
      However that's not enough, e.g., when there're host huge page enabled, when
      sending a precopy huge page, a postcopy request needs to wait until the whole
      huge page that is sending to finish.  That could introduce quite some delay,
      the bigger the huge page is the larger delay it'll bring.
      
      This patch adds a new capability to allow postcopy requests to preempt existing
      precopy page during sending a huge page, so that postcopy requests can be
      serviced even faster.
      
      Meanwhile to send it even faster, bypass the precopy stream by providing a
      standalone postcopy socket for sending requested pages.
      
      Since the new behavior will not be compatible with the old behavior, this will
      not be the default, it's enabled only when the new capability is set on both
      src/dst QEMUs.
      
      This patch only adds the capability itself, the logic will be added in follow
      up patches.
      
      Reviewed-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      Reviewed-by: default avatarJuan Quintela <quintela@redhat.com>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20220707185342.26794-2-peterx@redhat.com>
      Signed-off-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      ce5b0f4a
    • Ilya Leoshkevich's avatar
      multifd: Copy pages before compressing them with zlib · 007e179e
      Ilya Leoshkevich authored
      zlib_send_prepare() compresses pages of a running VM. zlib does not
      make any thread-safety guarantees with respect to changing deflate()
      input concurrently with deflate() [1].
      
      One can observe problems due to this with the IBM zEnterprise Data
      Compression accelerator capable zlib [2]. When the hardware
      acceleration is enabled, migration/multifd/tcp/plain/zlib test fails
      intermittently [3] due to sliding window corruption. The accelerator's
      architecture explicitly discourages concurrent accesses [4]:
      
          Page 26-57, "Other Conditions":
      
          As observed by this CPU, other CPUs, and channel
          programs, references to the parameter block, first,
          second, and third operands may be multiple-access
          references, accesses to these storage locations are
          not necessarily block-concurrent, and the sequence
          of these accesses or references is undefined.
      
      Mark Adler pointed out that vanilla zlib performs double fetches under
      certain circumstances as well [5], therefore we need to copy data
      before passing it to deflate().
      
      [1] https://zlib.net/manual.html
      [2] https://github.com/madler/zlib/pull/410
      [3] https://lists.nongnu.org/archive/html/qemu-devel/2022-03/msg03988.html
      [4] http://publibfp.dhe.ibm.com/epubs/pdf/a227832c.pdf
      [5] https://lists.gnu.org/archive/html/qemu-devel/2022-07/msg00889.html
      
      
      
      Signed-off-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Message-Id: <20220705203559.2960949-1-iii@linux.ibm.com>
      Reviewed-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      Signed-off-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      007e179e
    • Hyman Huang's avatar
      tests: Add dirty page rate limit test · 8aff6f50
      Hyman Huang authored
      
      Add dirty page rate limit test if kernel support dirty ring,
      
      The following qmp commands are covered by this test case:
      "calc-dirty-rate", "query-dirty-rate", "set-vcpu-dirty-limit",
      "cancel-vcpu-dirty-limit" and "query-vcpu-dirty-limit".
      
      Signed-off-by: default avatarHyman Huang(黄勇) <huangy81@chinatelecom.cn>
      Acked-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <eed5b847a6ef0a9c02a36383dbdd7db367dd1e7e.1656177590.git.huangy81@chinatelecom.cn>
      Signed-off-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      8aff6f50
    • Hyman Huang's avatar
      softmmu/dirtylimit: Implement dirty page rate limit · f3b2e38c
      Hyman Huang authored
      
      Implement dirtyrate calculation periodically basing on
      dirty-ring and throttle virtual CPU until it reachs the quota
      dirty page rate given by user.
      
      Introduce qmp commands "set-vcpu-dirty-limit",
      "cancel-vcpu-dirty-limit", "query-vcpu-dirty-limit"
      to enable, disable, query dirty page limit for virtual CPU.
      
      Meanwhile, introduce corresponding hmp commands
      "set_vcpu_dirty_limit", "cancel_vcpu_dirty_limit",
      "info vcpu_dirty_limit" so the feature can be more usable.
      
      "query-vcpu-dirty-limit" success depends on enabling dirty
      page rate limit, so just add it to the list of skipped
      command to ensure qmp-cmd-test run successfully.
      
      Signed-off-by: default avatarHyman Huang(黄勇) <huangy81@chinatelecom.cn>
      Acked-by: default avatarMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <4143f26706d413dd29db0b672fe58b3d3fbe34bc.1656177590.git.huangy81@chinatelecom.cn>
      Signed-off-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      f3b2e38c
    • Hyman Huang's avatar
      softmmu/dirtylimit: Implement virtual CPU throttle · baa60983
      Hyman Huang authored
      
      Setup a negative feedback system when vCPU thread
      handling KVM_EXIT_DIRTY_RING_FULL exit by introducing
      throttle_us_per_full field in struct CPUState. Sleep
      throttle_us_per_full microseconds to throttle vCPU
      if dirtylimit is in service.
      
      Signed-off-by: default avatarHyman Huang(黄勇) <huangy81@chinatelecom.cn>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <977e808e03a1cef5151cae75984658b6821be618.1656177590.git.huangy81@chinatelecom.cn>
      Signed-off-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      baa60983
    • Hyman Huang's avatar
      accel/kvm/kvm-all: Introduce kvm_dirty_ring_size function · 4a06a7cc
      Hyman Huang authored
      
      Introduce kvm_dirty_ring_size util function to help calculate
      dirty ring ful time.
      
      Signed-off-by: default avatarHyman Huang(黄勇) <huangy81@chinatelecom.cn>
      Acked-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <f9ce1f550bfc0e3a1f711e17b1dbc8f701700e56.1656177590.git.huangy81@chinatelecom.cn>
      Signed-off-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      4a06a7cc
    • Hyman Huang's avatar
      softmmu/dirtylimit: Implement vCPU dirtyrate calculation periodically · cc2b33ea
      Hyman Huang authored
      
      Introduce the third method GLOBAL_DIRTY_LIMIT of dirty
      tracking for calculate dirtyrate periodly for dirty page
      rate limit.
      
      Add dirtylimit.c to implement dirtyrate calculation periodly,
      which will be used for dirty page rate limit.
      
      Add dirtylimit.h to export util functions for dirty page rate
      limit implementation.
      
      Signed-off-by: default avatarHyman Huang(黄勇) <huangy81@chinatelecom.cn>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <5d0d641bffcb9b1c4cc3e323b6dfecb36050d948.1656177590.git.huangy81@chinatelecom.cn>
      Signed-off-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      cc2b33ea
    • Hyman Huang's avatar
      migration/dirtyrate: Refactor dirty page rate calculation · 8244166d
      Hyman Huang authored
      
      abstract out dirty log change logic into function
      global_dirty_log_change.
      
      abstract out dirty page rate calculation logic via
      dirty-ring into function vcpu_calculate_dirtyrate.
      
      abstract out mathematical dirty page rate calculation
      into do_calculate_dirtyrate, decouple it from DirtyStat.
      
      rename set_sample_page_period to dirty_stat_wait, which
      is well-understood and will be reused in dirtylimit.
      
      handle cpu hotplug/unplug scenario during measurement of
      dirty page rate.
      
      export util functions outside migration.
      
      Signed-off-by: default avatarHyman Huang(黄勇) <huangy81@chinatelecom.cn>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <7b6f6f4748d5b3d017b31a0429e630229ae97538.1656177590.git.huangy81@chinatelecom.cn>
      Signed-off-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      8244166d
    • Hyman Huang's avatar
      cpus: Introduce cpu_list_generation_id · ab1a161f
      Hyman Huang authored
      
      Introduce cpu_list_generation_id to track cpu list generation so
      that cpu hotplug/unplug can be detected during measurement of
      dirty page rate.
      
      cpu_list_generation_id could be used to detect changes of cpu
      list, which is prepared for dirty page rate measurement.
      
      Signed-off-by: default avatarHyman Huang(黄勇) <huangy81@chinatelecom.cn>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <06e1f1362b2501a471dce796abb065b04f320fa5.1656177590.git.huangy81@chinatelecom.cn>
      Signed-off-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      ab1a161f
    • Hyman Huang's avatar
      accel/kvm/kvm-all: Refactor per-vcpu dirty ring reaping · 1667e2b9
      Hyman Huang authored
      
      Add a non-required argument 'CPUState' to kvm_dirty_ring_reap so
      that it can cover single vcpu dirty-ring-reaping scenario.
      
      Signed-off-by: default avatarHyman Huang(黄勇) <huangy81@chinatelecom.cn>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <c32001242875e83b0d9f78f396fe2dcd380ba9e8.1656177590.git.huangy81@chinatelecom.cn>
      Signed-off-by: default avatarDr. David Alan Gilbert <dgilbert@redhat.com>
      1667e2b9
    • Peter Maydell's avatar
      Merge tag 'pull-hex-20220719-1' of https://github.com/quic/qemu into staging · 1f64dd76
      Peter Maydell authored
      Recall that the semantics of a Hexagon mem_noshuf packet are that the
      store effectively happens before the load.  There are two bug fixes
      in this series.
      
      # gpg: Signature made Tue 19 Jul 2022 22:25:19 BST
      # gpg:                using RSA key 3635C788CE62B91FD4C59AB47B0244FB12DE4422
      # gpg: Good signature from "Taylor Simpson (Rock on) <tsimpson@quicinc.com>" [undefined]
      # gpg: WARNING: This key is not certified with a trusted signature!
      # gpg:          There is no indication that the signature belongs to the owner.
      # Primary key fingerprint: 3635 C788 CE62 B91F D4C5  9AB4 7B02 44FB 12DE 4422
      
      * tag 'pull-hex-20220719-1' of https://github.com/quic/qemu
      
      :
        Hexagon (target/hexagon) fix bug in mem_noshuf load exception
        Hexagon (target/hexagon) fix store w/mem_noshuf & predicated load
      
      Signed-off-by: default avatarPeter Maydell <peter.maydell@linaro.org>
      1f64dd76
    • Zhang Chen's avatar
      net/colo.c: fix segmentation fault when packet is not parsed correctly · 8bdab83b
      Zhang Chen authored
      
      When COLO use only one vnet_hdr_support parameter between
      filter-redirector and filter-mirror(or colo-compare), COLO will crash
      with segmentation fault. Back track as follow:
      
      Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
      0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
          at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
      296         uint16_t proto = be16_to_cpu(PKT_GET_ETH_HDR(p)->h_proto);
      (gdb) bt
      0  0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
          at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
      1  0x0000555555cb22b4 in parse_packet_early (pkt=0x555556a44840) at
      net/colo.c:49
      2  0x0000555555cb2b91 in is_tcp_packet (pkt=0x555556a44840) at
      net/filter-rewriter.c:63
      
      So wrong vnet_hdr_len will cause pkt->data become NULL. Add check to
      raise error and add trace-events to track vnet_hdr_len.
      
      Signed-off-by: default avatarTao Xu <tao3.xu@intel.com>
      Signed-off-by: default avatarZhang Chen <chen.zhang@intel.com>
      Reviewed-by: default avatarLi Zhijian <lizhijian@fujitsu.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      8bdab83b
    • Zhang Chen's avatar
      net/colo.c: No need to track conn_list for filter-rewriter · 94c36c48
      Zhang Chen authored
      
      Filter-rewriter no need to track connection in conn_list.
      This patch fix the glib g_queue_is_empty assertion when COLO guest
      keep a lot of network connection.
      
      Signed-off-by: default avatarZhang Chen <chen.zhang@intel.com>
      Reviewed-by: default avatarLi Zhijian <lizhijian@fujitsu.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      94c36c48
    • Zhang Chen's avatar
      net/colo: Fix a "double free" crash to clear the conn_list · a18d4369
      Zhang Chen authored
      
      We notice the QEMU may crash when the guest has too many
      incoming network connections with the following log:
      
      15197@1593578622.668573:colo_proxy_main : colo proxy connection hashtable full, clear it
      free(): invalid pointer
      [1]    15195 abort (core dumped)  qemu-system-x86_64 ....
      
      This is because we create the s->connection_track_table with
      g_hash_table_new_full() which is defined as:
      
      GHashTable * g_hash_table_new_full (GHashFunc hash_func,
                             GEqualFunc key_equal_func,
                             GDestroyNotify key_destroy_func,
                             GDestroyNotify value_destroy_func);
      
      The fourth parameter connection_destroy() will be called to free the
      memory allocated for all 'Connection' values in the hashtable when
      we call g_hash_table_remove_all() in the connection_hashtable_reset().
      
      But both connection_track_table and conn_list reference to the same
      conn instance. It will trigger double free in conn_list clear. So this
      patch remove free action on hash table side to avoid double free the
      conn.
      
      Signed-off-by: default avatarLike Xu <like.xu@linux.intel.com>
      Signed-off-by: default avatarZhang Chen <chen.zhang@intel.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      a18d4369
Loading