- Nov 30, 2023
-
-
Fabiano Rosas authored
This is being shadowed but the assignments at multifd_channel_connect() and multifd_tls_channel_connect() . Signed-off-by:
Fabiano Rosas <farosas@suse.de> Message-ID: <20231110200241.20679-2-farosas@suse.de> Signed-off-by:
Juan Quintela <quintela@redhat.com>
-
- Oct 31, 2023
-
-
Juan Quintela authored
After last commit, it is a write only variable. Reviewed-by:
Fabiano Rosas <farosas@suse.de> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231025091117.6342-12-quintela@redhat.com>
-
- Oct 20, 2023
-
-
Fabiano Rosas authored
We don't need to check p->quit in the multifd_send_thread() because it is shadowed by the 'exiting' flag. Ever since that flag was added p->quit became obsolete as a way to stop the thread. Since p->quit is set at multifd_send_terminate_threads() under the p->mutex lock, the thread will only see it once it loops, so 'exiting' will always be seen first. Note that setting p->quit at multifd_send_terminate_threads() still makes sense because we need a way to inform multifd_send_pages() that the channel has stopped. Signed-off-by:
Fabiano Rosas <farosas@suse.de> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231012140651.13122-3-farosas@suse.de>
-
- Oct 17, 2023
-
-
Fabiano Rosas authored
The function is currently called from two sites, one always gives it a NULL Error and the other always gives it a non-NULL Error. In the non-NULL case, all it does it trace the error and return. One of the callers already have tracing, add a tracepoint to the other and stop passing the error into the function. Cc: Markus Armbruster <armbru@redhat.com> Signed-off-by:
Fabiano Rosas <farosas@suse.de> Reviewed-by:
Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231012134343.23757-4-farosas@suse.de>
-
Fabiano Rosas authored
The preferred usage of the Error type is to always set both the return code and the error when a failure happens. As all code called from the send thread follows this pattern, we'll always have the return code and the error set at the same time. Aside from the convention, in this piece of code this must be the case, otherwise the if (ret != 0) would be exiting the thread without calling multifd_send_terminate_threads() which is incorrect. Unify both paths to make it clear that both are taken when there's an error. Signed-off-by:
Fabiano Rosas <farosas@suse.de> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231012134343.23757-3-farosas@suse.de>
-
Fabiano Rosas authored
We're about to enable support for other transports in multifd, so remove direct references to sockets. Signed-off-by:
Fabiano Rosas <farosas@suse.de> Reviewed-by:
Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231012134343.23757-2-farosas@suse.de>
-
Elena Ufimtseva authored
Sometimes multifd sends just sync packet with no pages (normal_num is 0). In this case the old value is being preserved and being accounted for while only packet_len is being transferred. Reset it to 0 after sending and accounting for. Signed-off-by:
Elena Ufimtseva <elena.ufimtseva@oracle.com> Reviewed-by:
Fabiano Rosas <farosas@suse.de> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231011184358.97349-5-elena.ufimtseva@oracle.com>
-
Elena Ufimtseva authored
Previous commit cbec7eb7 "migration/multifd: Compute transferred bytes correctly" removed accounting for packet_len in non-rdma case, but the next_packet_size only accounts for pages, not for the header packet (normal_pages * PAGE_SIZE) that is being sent as iov[0]. The packet_len part should be added to account for the size of MultiFDPacket and the array of the offsets. Signed-off-by:
Elena Ufimtseva <elena.ufimtseva@oracle.com> Reviewed-by:
Fabiano Rosas <farosas@suse.de> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231011184358.97349-4-elena.ufimtseva@oracle.com>
-
- Jul 26, 2023
-
-
Fabiano Rosas authored
We're about to add more functions to this file so make it use the same coding style as the rest of the code. Signed-off-by:
Fabiano Rosas <farosas@suse.de> Reviewed-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by:
Peter Xu <peterx@redhat.com> Message-Id: <20230607161306.31425-2-farosas@suse.de> Signed-off-by:
Juan Quintela <quintela@redhat.com>
-
- Jul 25, 2023
-
-
Michael Tokarev authored
Signed-off-by:
Michael Tokarev <mjt@tls.msk.ru> Reviewed-by:
Fabiano Rosas <farosas@suse.de>
-
- Jun 05, 2023
-
-
Philippe Mathieu-Daudé authored
Mechanical change running Coccinelle spatch with content generated from the qom-cast-macro-clean-cocci-gen.py added in the previous commit. Suggested-by:
Markus Armbruster <armbru@redhat.com> Signed-off-by:
Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230601093452.38972-3-philmd@linaro.org> Reviewed-by:
Richard Henderson <richard.henderson@linaro.org> Signed-off-by:
Thomas Huth <thuth@redhat.com>
-
- May 18, 2023
-
-
Juan Quintela authored
In the past, we had to put the in the main thread all the operations related with sizes due to qemu_file not beeing thread safe. As now all counters are atomic, we can update the counters just after the do the write. As an aditional bonus, we are able to use the right value for the compression methods. Right now we were assuming that there were no compression at all. Signed-off-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Peter Xu <peterx@redhat.com> Message-Id: <20230515195709.63843-17-quintela@redhat.com>
-
Juan Quintela authored
Since previous commit, we calculate how much data we have send with migration_transferred_bytes() so no need to maintain this counter and remember to always update it. Signed-off-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Cédric Le Goater <clg@kaod.org> Message-Id: <20230515195709.63843-10-quintela@redhat.com>
-
Juan Quintela authored
These way we can make them atomic and use this functions from any place. I also moved all functions that use rate_limit to migration-stats. Functions got renamed, they are not qemu_file anymore. qemu_file_rate_limit -> migration_rate_exceeded qemu_file_set_rate_limit -> migration_rate_set qemu_file_get_rate_limit -> migration_rate_get qemu_file_reset_rate_limit -> migration_rate_reset qemu_file_acct_rate_limit -> migration_rate_account. Reviewed-by:
Harsh Prateek Bora <harshpb@linux.ibm.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-Id: <20230515195709.63843-6-quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com>
-
- May 10, 2023
-
-
Lukas Straub authored
This will be used in the next commits to add colo support to multifd. Signed-off-by:
Lukas Straub <lukasstraub2@web.de> Reviewed-by:
Juan Quintela <quintela@redhat.com> Message-Id: <88135197411df1a71d7832962b39abf60faf0021.1683572883.git.lukasstraub2@web.de> Signed-off-by:
Juan Quintela <quintela@redhat.com>
-
- May 03, 2023
-
-
Juan Quintela authored
It is not needed since we moved the accessor for tls properties to options.c. Suggested-by:
Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
-
Juan Quintela authored
migration_stats is just too long, and it is going to have more than ram counters in the near future. Signed-off-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Lukas Straub <lukasstraub2@web.de>
-
Juan Quintela authored
There is already include/qemu/stats.h, so stats.h was a bad idea. We want this file to not depend on anything else, we will move all the migration counters/stats to this struct. Signed-off-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Lukas Straub <lukasstraub2@web.de>
-
Juan Quintela authored
Signed-off-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Lukas Straub <lukasstraub2@web.de>
-
- Apr 27, 2023
-
-
Juan Quintela authored
We don't wait in the sem when we are doing a sync_main. Make it wait there. To make things clearer, we mark the channel ready at the begining of the thread loop. Signed-off-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Fabiano Rosas <farosas@suse.de>
-
- Apr 24, 2023
-
-
Juan Quintela authored
Once that we are there, we rename the function to migrate_zero_copy_send() to be consistent with all other capabilities. We can remove the CONFIG_LINUX guard. We already check that we can't setup this capability in migrate_caps_check(). Signed-off-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
-
Juan Quintela authored
Once that we are there, we rename the function to migrate_multifd() to be consistent with all other capabilities. Signed-off-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
-
Juan Quintela authored
Signed-off-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Peter Xu <peterx@redhat.com>
-
Juan Quintela authored
In the spirit of: commit 394d323bc3451e4d07f13341cb8817fac8dfbadd Author: Peter Xu <peterx@redhat.com> Date: Tue Oct 11 17:55:51 2022 -0400 migration: Use atomic ops properly for page accountings Reviewed-by:
David Edmondson <david.edmondson@oracle.com> Reviewed-by:
Peter Xu <peterx@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com>
-
Juan Quintela authored
Reviewed-by:
David Edmondson <david.edmondson@oracle.com> Reviewed-by:
Peter Xu <peterx@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com>
-
Juan Quintela authored
Using MgrationStats as type for ram_counters mean that we didn't have to re-declare each value in another struct. The need of atomic counters have make us to create MigrationAtomicStats for this atomic counters. Create RAMStats type which is a merge of MigrationStats and MigrationAtomicStats removing unused members. Signed-off-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Peter Xu <peterx@redhat.com> --- Fix typos found by David Edmondson
-
- Mar 16, 2023
-
-
Wei Wang authored
The p->flags could be updated via the send_prepare callback, e.g. OR-ed with MULTIFD_FLAG_ZLIB via zlib_send_prepare. Assign p->flags to the local "flags" before the send_prepare callback could only get partial of p->flags. Fix it by moving the assignment of p->flags to the local flags after the callback, so that the correct flags can be traced. Fixes: ab7cbb0b ("multifd: Make no compression operations into its own structure") Signed-off-by:
Wei Wang <wei.w.wang@intel.com> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com>
-
- Feb 13, 2023
-
-
Leonardo Bras authored
Currently running migration_incoming_state_destroy() without first running multifd_load_cleanup() will cause a yank error: qemu-system-x86_64: ../util/yank.c:107: yank_unregister_instance: Assertion `QLIST_EMPTY(&entry->yankfns)' failed. (core dumped) The above error happens in the target host, when multifd is being used for precopy, and then postcopy is triggered and the migration finishes. This will crash the VM in the target host. To avoid that, move multifd_load_cleanup() inside migration_incoming_state_destroy(), so that the load cleanup becomes part of the incoming state destroying process. Running multifd_load_cleanup() twice can become an issue, though, but the only scenario it could be ran twice is on process_incoming_migration_bh(). So removing this extra call is necessary. On the other hand, this multifd_load_cleanup() call happens way before the migration_incoming_state_destroy() and having this happening before dirty_bitmap_mig_before_vm_start() and vm_start() may be a need. So introduce a new function multifd_load_shutdown() that will mainly stop all multifd threads and close their QIOChannels. Then use this function instead of multifd_load_cleanup() to make sure nothing else is received before dirty_bitmap_mig_before_vm_start(). Fixes: b5eea99e ("migration: Add yank feature") Reported-by:
Li Xiaohui <xiaohli@redhat.com> Signed-off-by:
Leonardo Bras <leobras@redhat.com> Reviewed-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Peter Xu <peterx@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com>
-
Leonardo Bras authored
Current approach will only join threads that are still running. For the threads not joined, resources or private memory are always kept in the process space and never reclaimed before process end, and this risks serious memory leaks. This should usually not represent a big problem, since multifd migration is usually just ran at most a few times, and after it succeeds there is not much to be done before exiting the process. Yet still, it should not hurt performance to join all of them. Fixes: b5eea99e ("migration: Add yank feature") Reported-by:
Li Xiaohui <xiaohli@redhat.com> Signed-off-by:
Leonardo Bras <leobras@redhat.com> Reviewed-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Peter Xu <peterx@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com>
-
Leonardo Bras authored
Before assigning "p->quit = true" for every multifd channel, multifd_load_cleanup() will call multifd_recv_terminate_threads() which already does the same assignment, while protected by a mutex. So there is no point doing the same assignment again. Fixes: b5eea99e ("migration: Add yank feature") Reported-by:
Li Xiaohui <xiaohli@redhat.com> Signed-off-by:
Leonardo Bras <leobras@redhat.com> Reviewed-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Peter Xu <peterx@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com>
-
Leonardo Bras authored
Since it's introduction in commit f986c3d2 ("migration: Create multifd migration threads"), multifd_load_cleanup() never returned any value different than 0, neither set up any error on errp. Even though, on process_incoming_migration_bh() an if clause uses it's return value to decide on setting autostart = false, which will never happen. In order to simplify the codebase, change multifd_load_cleanup() signature to 'void multifd_load_cleanup(void)', and for every usage remove error handling or decision made based on return value != 0. Fixes: b5eea99e ("migration: Add yank feature") Reported-by:
Li Xiaohui <xiaohli@redhat.com> Signed-off-by:
Leonardo Bras <leobras@redhat.com> Reviewed-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Peter Xu <peterx@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com>
-
- Feb 11, 2023
-
-
Peter Xu authored
The whole idea of multi-channel checks was not properly done, IMHO. Currently we check multi-channel in a lot of places, but actually that's not needed because we only need to check it right after we get the URI and that should be it. If the URI check succeeded, we should never need to check it again because we must have it. If it check fails, we should fail immediately on either the qmp_migrate or qmp_migrate_incoming, instead of failingg it later after the connection established. Neither should we fail any set capabiliities like what we used to do here: 5ad15e86 ("migration: allow enabling mutilfd for specific protocol only", 2021-10-19) Because logically the URI will only be set later after the capability is set, so it doesn't make a lot of sense to check the URI type when setting the capability, because we're checking the cap with an old URI passed in, and that may not even be the URI we're going to use later. This patch mostly reverted all such checks for before, dropping the variable migrate_allow_multi_channels and helpers. Instead, add a common helper to check URI for multi-channels for either qmp_migrate and qmp_migrate_incoming and that should do all the proper checks. The failure will only trigger with the "migrate" or "migrate_incoming" command, or when user specified "-incoming xxx" where "xxx" is not "defer". Signed-off-by:
Peter Xu <peterx@redhat.com> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com>
-
Li Zhang authored
Clean up some unnecessary code Signed-off-by:
Li Zhang <lizhang@suse.de> Signed-off-by:
Juan Quintela <quintela@redhat.com>
-
Li Zhang authored
Cleanup multifd_channel_connect Signed-off-by:
Li Zhang <lizhang@suse.de> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com>
-
- Feb 06, 2023
-
-
Jiang Jiacheng authored
To support query migration thread infomation, save and delete thread(live_migration and multifdsend) information at thread creation and finish. Signed-off-by:
Jiang Jiacheng <jiangjiacheng@huawei.com> Signed-off-by:
Juan Quintela <quintela@redhat.com>
-
Zhenzhong Duan authored
Make IO channel flush call after the inflight request has been drained in multifd thread, or else we may missed to flush the inflight request. Signed-off-by:
Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com>
-
Zhenzhong Duan authored
In multifd_queue_page() MultiFDPages_t.block is checked twice. Between the two checks, MultiFDPages_t.block may be reset to NULL by multifd thread. This lead to the 2nd check always true then a redundant page submitted to multifd thread again. Signed-off-by:
Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com>
-
manish.mishra authored
Current logic assumes that channel connections on the destination side are always established in the same order as the source and the first one will always be the main channel followed by the multifid or post-copy preemption channel. This may not be always true, as even if a channel has a connection established on the source side it can be in the pending state on the destination side and a newer connection can be established first. Basically causing out of order mapping of channels on the destination side. Currently, all channels except post-copy preempt send a magic number, this patch uses that magic number to decide the type of channel. This logic is applicable only for precopy(multifd) live migration, as mentioned, the post-copy preempt channel does not send any magic number. Also, tls live migrations already does tls handshake before creating other channels, so this issue is not possible with tls, hence this logic is avoided for tls live migrations. This patch uses read peek to check the magic number of channels so that current data/control stream management remains un-effected. Reviewed-by:
Peter Xu <peterx@redhat.com> Reviewed-by:
Daniel P. Berrange <berrange@redhat.com> Reviewed-by:
Juan Quintela <quintela@redhat.com> Suggested-by:
Daniel P. Berrange <berrange@redhat.com> Signed-off-by:
manish.mishra <manish.mishra@nutanix.com> Signed-off-by:
Juan Quintela <quintela@redhat.com>
-
- Dec 15, 2022
-
-
Peter Xu authored
To prepare for thread-safety on page accountings, at least below counters need to be accessed only atomically, they are: ram_counters.transferred ram_counters.duplicate ram_counters.normal ram_counters.postcopy_bytes There are a lot of other counters but they won't be accessed outside migration thread, then they're still safe to be accessed without atomic ops. Reviewed-by:
Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by:
Peter Xu <peterx@redhat.com> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com>
-
Juan Quintela authored
We were recalculating it left and right. We plan to change that values on next patches. Signed-off-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Leonardo Bras <leobras@redhat.com>
-