- Nov 02, 2023
-
-
Peter Xu authored
Now we have a Error** passed into the return path thread stack, which is even clearer than an int retval. Change ram_dirty_bitmap_reload() and the callers to use a bool instead to replace errnos. Suggested-by:
Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by:
Peter Xu <peterx@redhat.com> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231017202633.296756-5-peterx@redhat.com>
-
Peter Xu authored
rp_state.error was a boolean used to show error happened in return path thread. That's not only duplicating error reporting (migrate_set_error), but also not good enough in that we only do error_report() and set it to true, we never can keep a history of the exact error and show it in query-migrate. To make this better, a few things done: - Use error_setg() rather than error_report() across the whole lifecycle of return path thread, keeping the error in an Error*. - With above, no need to have mark_source_rp_bad(), remove it, alongside with rp_state.error itself. - Use migrate_set_error() to apply that captured error to the global migration object when error occured in this thread. - Do the same when detected qemufile error in source return path We need to re-export qemu_file_get_error_obj() to do the last one. Signed-off-by:
Peter Xu <peterx@redhat.com> Reviewed-by:
Fabiano Rosas <farosas@suse.de> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231017202633.296756-2-peterx@redhat.com>
-
- Oct 30, 2023
-
-
Juan Quintela authored
So we can move more compression_counters stuff to ram-compress.c. Create compression_counters struct to add the stuff that was on MigrationState. Reviewed-by:
Lukas Straub <lukasstraub2@web.de> Reviewed-by:
Fabiano Rosas <farosas@suse.de> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231019110724.15324-8-quintela@redhat.com>
-
Juan Quintela authored
Now that we know it only handles zero, we can remove the ch parameter. Reviewed-by:
Fabiano Rosas <farosas@suse.de> Reviewed-by:
Peter Xu <peterx@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231019085259.13307-3-quintela@redhat.com>
-
- Jul 12, 2023
-
-
David Hildenbrand authored
virtio-mem wants to know whether it should not mess with the RAMBlock content (e.g., discard RAM, preallocate memory) on incoming migration. So let's expose that function as migrate_ram_is_ignored() in migration/misc.h Message-ID: <20230706075612.67404-4-david@redhat.com> Acked-by:
Peter Xu <peterx@redhat.com> Tested-by:
Mario Casquero <mcasquer@redhat.com> Reviewed-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Michael S. Tsirkin <mst@redhat.com> Signed-off-by:
David Hildenbrand <david@redhat.com>
-
- May 10, 2023
-
-
Lukas Straub authored
The overhead of the mutex in non-multifd mode is negligible, because in that case its just the single thread taking the mutex. This will be used in the next commits to add colo support to multifd. Signed-off-by:
Lukas Straub <lukasstraub2@web.de> Reviewed-by:
Juan Quintela <quintela@redhat.com> Message-Id: <22d83cb428f37929563155531bfb69fd8953cc61.1683572883.git.lukasstraub2@web.de> Signed-off-by:
Juan Quintela <quintela@redhat.com>
-
- May 03, 2023
-
-
Juan Quintela authored
Signed-off-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Lukas Straub <lukasstraub2@web.de>
-
Juan Quintela authored
Now that we have atomic counters, we can do it on the place that we need it, no need to do it inside ram.c. Signed-off-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Lukas Straub <lukasstraub2@web.de>
-
Juan Quintela authored
There is already include/qemu/stats.h, so stats.h was a bad idea. We want this file to not depend on anything else, we will move all the migration counters/stats to this struct. Signed-off-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Lukas Straub <lukasstraub2@web.de>
-
- Apr 27, 2023
-
-
Juan Quintela authored
As we set its value, it needs to be operated with atomics. We rename it from remaining to better reflect its meaning. Statistics always return the real reamaining bytes. This was used to store how much pages where dirty on the previous generation, so we can calculate the expected downtime as: dirty_bytes_last_sync / current_bandwith. If we use the actual remaining bytes, we would see a very small value at the end of the iteration. Signed-off-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Peter Xu <peterx@redhat.com> Reviewed-by:
Richard Henderson <richard.henderson@linaro.org> --- I am open to use ram_bytes_remaining() in its only use and be more "optimistic" about the downtime. Don't use __nocheck() functions. Use stat64_get() now that it exists.
-
Juan Quintela authored
Signed-off-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Richard Henderson <richard.henderson@linaro.org> Reviewed-by:
Peter Xu <peterx@redhat.com> --- Don't use __nocheck() variants Use stat64_get()
-
- Apr 24, 2023
-
-
Juan Quintela authored
Rest of counters that refer to pages has a _pages suffix. And historically, this showed the number of full pages transferred. The name "normal" refered to the fact that they were sent without any optimization (compression, xbzrle, zero_page, ...). Signed-off-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Peter Xu <peterx@redhat.com>
-
Juan Quintela authored
Rest of counters that refer to pages has a _pages suffix. And historically, this showed the number of pages composed of the same character, here comes the name "duplicated". But since years ago, it refers to the number of zero_pages. Signed-off-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Peter Xu <peterx@redhat.com>
-
Juan Quintela authored
Signed-off-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Peter Xu <peterx@redhat.com>
-
Juan Quintela authored
Signed-off-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Peter Xu <peterx@redhat.com>
-
Juan Quintela authored
Signed-off-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Peter Xu <peterx@redhat.com>
-
Juan Quintela authored
Signed-off-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Peter Xu <peterx@redhat.com>
-
Juan Quintela authored
Signed-off-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Peter Xu <peterx@redhat.com>
-
Juan Quintela authored
In the spirit of: commit 394d323bc3451e4d07f13341cb8817fac8dfbadd Author: Peter Xu <peterx@redhat.com> Date: Tue Oct 11 17:55:51 2022 -0400 migration: Use atomic ops properly for page accountings Reviewed-by:
David Edmondson <david.edmondson@oracle.com> Reviewed-by:
Peter Xu <peterx@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com>
-
Juan Quintela authored
Using MgrationStats as type for ram_counters mean that we didn't have to re-declare each value in another struct. The need of atomic counters have make us to create MigrationAtomicStats for this atomic counters. Create RAMStats type which is a merge of MigrationStats and MigrationAtomicStats removing unused members. Signed-off-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Peter Xu <peterx@redhat.com> --- Fix typos found by David Edmondson
-
- Dec 15, 2022
-
-
Peter Xu authored
To prepare for thread-safety on page accountings, at least below counters need to be accessed only atomically, they are: ram_counters.transferred ram_counters.duplicate ram_counters.normal ram_counters.postcopy_bytes There are a lot of other counters but they won't be accessed outside migration thread, then they're still safe to be accessed without atomic ops. Reviewed-by:
Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by:
Peter Xu <peterx@redhat.com> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com>
-
Juan Quintela authored
Signed-off-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Leonardo Bras <leobras@redhat.com>
-
Juan Quintela authored
Signed-off-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by:
David Edmondson <david.edmondson@oracle.com> Reviewed-by:
Leonardo Bras <leobras@redhat.com>
-
- Jul 20, 2022
-
-
Leonardo Bras authored
Some errors, like the lack of Scatter-Gather support by the network interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using zero-copy, which causes it to fall back to the default copying mechanism. After each full dirty-bitmap scan there should be a zero-copy flush happening, which checks for errors each of the previous calls to sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then increment dirty_sync_missed_zero_copy migration stat to let the user know about it. Signed-off-by:
Leonardo Bras <leobras@redhat.com> Reviewed-by:
Daniel P. Berrangé <berrange@redhat.com> Acked-by:
Peter Xu <peterx@redhat.com> Message-Id: <20220711211112.18951-4-leobras@redhat.com> Signed-off-by:
Dr. David Alan Gilbert <dgilbert@redhat.com>
-
Peter Xu authored
Create a new socket for postcopy to be prepared to send postcopy requested pages via this specific channel, so as to not get blocked by precopy pages. A new thread is also created on dest qemu to receive data from this new channel based on the ram_load_postcopy() routine. The ram_load_postcopy(POSTCOPY) branch and the thread has not started to function, and that'll be done in follow up patches. Cleanup the new sockets on both src/dst QEMUs, meanwhile look after the new thread too to make sure it'll be recycled properly. Reviewed-by:
Daniel P. Berrangé <berrange@redhat.com> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Peter Xu <peterx@redhat.com> Message-Id: <20220707185502.27149-1-peterx@redhat.com> Signed-off-by:
Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: With Peter's fix to quieten compiler warning on start_migration
-
- Apr 21, 2022
-
-
Peter Xu authored
Will be reused in postcopy fast load thread. Reviewed-by:
Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by:
Peter Xu <peterx@redhat.com> Message-Id: <20220331150857.74406-6-peterx@redhat.com> Reviewed-by:
Daniel P. Berrangé <berrange@redhat.com> Signed-off-by:
Dr. David Alan Gilbert <dgilbert@redhat.com>
-
- Jan 28, 2022
-
-
Peter Xu authored
It will just never fail. Drop those return values where they're constantly zeros. A tiny touch-up on the tracepoint so trace_ram_postcopy_send_discard_bitmap() is called after the logic itself (which sounds more reasonable). Signed-off-by:
Peter Xu <peterx@redhat.com> Reviewed-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by:
Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com>
-
Peter Xu authored
I planned to add "#ifdef DEBUG_POSTCOPY" around the function too because otherwise it'll be compiled into qemu binary even if it'll never be used. Then I found that maybe it's easier to just drop it for good.. Signed-off-by:
Peter Xu <peterx@redhat.com> Reviewed-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by:
Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com>
-
- Nov 09, 2021
-
-
Rao, Lei authored
if we don't reset the auto-converge counter, it will continue to run with COLO running, and eventually the system will hang due to the CPU throttle reaching DEFAULT_MIGRATE_MAX_CPU_THROTTLE. Signed-off-by:
Lei Rao <lei.rao@intel.com> Reviewed-by:
Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by:
Lukas Straub <lukasstraub2@web.de> Tested-by:
Lukas Straub <lukasstraub2@web.de> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com>
-
- Nov 01, 2021
-
-
David Hildenbrand authored
Currently, when someone (i.e., the VM) accesses discarded parts inside a RAMBlock with a RamDiscardManager managing the corresponding mapped memory region, postcopy will request migration of the corresponding page from the source. The source, however, will never answer, because it refuses to migrate such pages with undefined content ("logically unplugged"): the pages are never dirty, and get_queued_page() will consequently skip processing these postcopy requests. Especially reading discarded ("logically unplugged") ranges is supposed to work in some setups (for example with current virtio-mem), although it barely ever happens: still, not placing a page would currently stall the VM, as it cannot make forward progress. Let's check the state via the RamDiscardManager (the state e.g., of virtio-mem is migrated during precopy) and avoid sending a request that will never get answered. Place a fresh zero page instead to keep the VM working. This is the same behavior that would happen automatically without userfaultfd being active, when accessing virtual memory regions without populated pages -- "populate on demand". For now, there are valid cases (as documented in the virtio-mem spec) where a VM might read discarded memory; in the future, we will disallow that. Then, we might want to handle that case differently, e.g., warning the user that the VM seems to be mis-behaving. Reviewed-by:
Peter Xu <peterx@redhat.com> Signed-off-by:
David Hildenbrand <david@redhat.com> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com>
-
- Apr 07, 2021
-
-
Andrey Gruzdev authored
This commit solves the issue with userfault_fd WP feature that background snapshot is based on. For any never poluated or discarded memory page, the UFFDIO_WRITEPROTECT ioctl() would skip updating PTE for that page, thereby loosing WP setting for it. So we need to pre-fault pages for each RAM block to be protected before making a userfault_fd wr-protect ioctl(). Fixes: 278e2f55 (migration: support UFFD write fault processing in ram_save_iterate()) Signed-off-by:
Andrey Gruzdev <andrey.gruzdev@virtuozzo.com> Reported-by:
David Hildenbrand <david@redhat.com> Reviewed-by:
David Hildenbrand <david@redhat.com> Message-Id: <20210401092226.102804-4-andrey.gruzdev@virtuozzo.com> Signed-off-by:
Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: Bodged ifdef __linux__ on ram_write_tracking_prepare, should really go in a stub
-
- Feb 08, 2021
-
-
Markus Armbruster authored
73af8dd8 "migration: Make xbzrle_cache_size a migration parameter" (v2.11.0) made the new parameter unsigned (QAPI type 'size', uint64_t in C). It neglected to update existing code, which continues to use int64_t. migrate_xbzrle_cache_size() returns the new parameter. Adjust its return type. QMP query-migrate-cache-size returns migrate_xbzrle_cache_size(). Adjust its return type. migrate-set-parameters passes the new parameter to xbzrle_cache_resize(). Adjust its parameter type. xbzrle_cache_resize() passes it on to cache_init(). Adjust its parameter type. Signed-off-by:
Markus Armbruster <armbru@redhat.com> Reviewed-by:
Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210202141734.2488076-3-armbru@redhat.com> Signed-off-by:
Dr. David Alan Gilbert <dgilbert@redhat.com>
-
Andrey Gruzdev authored
In this particular implementation the same single migration thread is responsible for both normal linear dirty page migration and procesing UFFD page fault events. Processing write faults includes reading UFFD file descriptor, finding respective RAM block and saving faulting page to the migration stream. After page has been saved, write protection can be removed. Since asynchronous version of qemu_put_buffer() is expected to be used to save pages, we also have to flush migraion stream prior to un-protecting saved memory range. Write protection is being removed for any previously protected memory chunk that has hit the migration stream. That's valid for pages from linear page scan along with write fault pages. Signed-off-by:
Andrey Gruzdev <andrey.gruzdev@virtuozzo.com> Acked-by:
Peter Xu <peterx@redhat.com> Reviewed-by:
Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210129101407.103458-4-andrey.gruzdev@virtuozzo.com> Signed-off-by:
Dr. David Alan Gilbert <dgilbert@redhat.com> fixup pagefault.address cast for 32bit
-
Andrey Gruzdev authored
Add new capability to 'qapi/migration.json' schema. Update migrate_caps_check() to validate enabled capability set against introduced one. Perform checks for required kernel features and compatibility with guest memory backends. Signed-off-by:
Andrey Gruzdev <andrey.gruzdev@virtuozzo.com> Reviewed-by:
Peter Xu <peterx@redhat.com> Acked-by:
Markus Armbruster <armbru@redhat.com> Message-Id: <20210129101407.103458-2-andrey.gruzdev@virtuozzo.com> Signed-off-by:
Dr. David Alan Gilbert <dgilbert@redhat.com>
-
- Sep 25, 2020
-
-
Chuan Zheng authored
RAMBLOCK_FOREACH_MIGRATABLE is need in dirtyrate measure, move the existing definition up into migration/ram.h Signed-off-by:
Chuan Zheng <zhengchuan@huawei.com> Reviewed-by:
Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by:
David Edmondson <david.edmondson@oracle.com> Reviewed-by:
Li Qiang <liq3ea@gmail.com> Message-Id: <1600237327-33618-6-git-send-email-zhengchuan@huawei.com> Signed-off-by:
Dr. David Alan Gilbert <dgilbert@redhat.com>
-
- Jun 01, 2020
-
-
Lukas Straub authored
If we suceed in receiving ram state, but fail receiving the device state, there will be a mismatch between the two. Fix this by flushing the ram cache only after the vmstate has been received. Signed-off-by:
Lukas Straub <lukasstraub2@web.de> Message-Id: <3289d007d494cb0e2f05b1cf4ae6a78d300fede3.1589193382.git.lukasstraub2@web.de> Reviewed-by:
zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by:
Dr. David Alan Gilbert <dgilbert@redhat.com>
-
- Mar 13, 2020
-
-
Hailiang Zhang authored
This patch will reduce the downtime of VM for the initial process, Previously, we copied all these memory in preparing stage of COLO while we need to stop VM, which is a time-consuming process. Here we optimize it by a trick, back-up every page while in migration process while COLO is enabled, though it affects the speed of the migration, but it obviously reduce the downtime of back-up all SVM'S memory in COLO preparing stage. Signed-off-by:
zhanghailiang <zhang.zhanghailiang@huawei.com> Message-Id: <20200224065414.36524-5-zhang.zhanghailiang@huawei.com> Signed-off-by:
Dr. David Alan Gilbert <dgilbert@redhat.com> minor typo fixes
-
- Jan 29, 2020
-
-
Juan Quintela authored
Signed-off-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Dr. David Alan Gilbert <dgilbert@redhat.com>
-
Juan Quintela authored
We need to change the full chain to pass the Error parameter. Signed-off-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Dr. David Alan Gilbert <dgilbert@redhat.com>
-
Juan Quintela authored
Signed-off-by:
Juan Quintela <quintela@redhat.com> Reviewed-by:
Dr. David Alan Gilbert <dgilbert@redhat.com>
-