- Oct 18, 2023
-
-
Richard Henderson authored
This line is supposed to be unreachable, but if we're going to have it at all, SIGABRT via abort() is subject to the same signal peril that created this function in the first place. We can _exit immediately without peril. Acked-by:
Helge Deller <deller@gmx.de> Reviewed-by:
Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by:
Richard Henderson <richard.henderson@linaro.org>
-
Richard Henderson authored
Because we trap so many signals for use by the guest, we have to take extra steps to exit properly. Acked-by:
Helge Deller <deller@gmx.de> Reviewed-by:
Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by:
Richard Henderson <richard.henderson@linaro.org>
-
Richard Henderson authored
Do not assert success, but return any failure received. Additionally, fix the method of earlier error return in target_munmap. Reported-by:
Andreas Schwab <schwab@suse.de> Signed-off-by:
Richard Henderson <richard.henderson@linaro.org>
-
Jiajie Chen authored
Since support for LSX and LASX is landed in QEMU recently, we can update HWCAPS accordingly. Signed-off-by:
Jiajie Chen <c@jia.je> Reviewed-by:
Richard Henderson <richard.henderson@linaro.org> Message-Id: <20231001085315.1692667-1-c@jia.je> Signed-off-by:
Richard Henderson <richard.henderson@linaro.org>
-
Mikulas Patocka authored
sh4 uses gUSA (general UserSpace Atomicity) to provide atomicity on CPUs that don't have atomic instructions. A gUSA region that adds 1 to an atomic variable stored in @R2 looks like this: 4004b6: 03 c7 mova 4004c4 <gusa+0x10>,r0 4004b8: f3 61 mov r15,r1 4004ba: 09 00 nop 4004bc: fa ef mov #-6,r15 4004be: 22 63 mov.l @r2,r3 4004c0: 01 73 add #1,r3 4004c2: 32 22 mov.l r3,@r2 4004c4: 13 6f mov r1,r15 R0 contains a pointer to the end of the gUSA region R1 contains the saved stack pointer R15 contains negative length of the gUSA region When this region is interrupted by a signal, the kernel detects if R15 >= -128U. If yes, the kernel rolls back PC to the beginning of the region and restores SP by copying R1 to R15. The problem happens if we are interrupted by a signal at address 4004c4. R15 still holds the value -6, but the atomic value was already written by an instruction at address 4004c2. In this situation we can't undo the gUSA. The function unwind_gusa does nothing, the signal handler attempts to push a signal frame to the address -6 and crashes. This patch fixes it, so that if we are interrupted at the last instruction in a gUSA region, we copy R1 to R15 to restore the correct stack pointer and avoid crashing. There's another bug: if we are interrupted in a delay slot, we save the address of the instruction in the delay slot. We must save the address of the previous instruction. Cc: qemu-stable@nongnu.org Signed-off-by:
Mikulas Patocka <mpatocka@redhat.com> Reviewed-by:
Yoshinori Sato <ysato@users.sourcefoege.jp> Message-Id: <b16389f7-6c62-70b7-59b3-87533c0bcc@redhat.com> Reviewed-by:
Richard Henderson <richard.henderson@linaro.org> Signed-off-by:
Richard Henderson <richard.henderson@linaro.org>
-
Mikulas Patocka authored
QEMU mips userspace emulation crashes with "qemu: unhandled CPU exception 0x15 - aborting" when one of the integer arithmetic instructions detects an overflow. This patch fixes it so that it delivers SIGFPE with FPE_INTOVF instead. Cc: qemu-stable@nongnu.org Signed-off-by:
Mikulas Patocka <mpatocka@redhat.com> Message-Id: <3ef979a8-3ee1-eb2d-71f7-d788ff88dd11@redhat.com> Reviewed-by:
Richard Henderson <richard.henderson@linaro.org> Signed-off-by:
Richard Henderson <richard.henderson@linaro.org>
-
Richard Henderson authored
The previous change, 2d385be6, assumed !PAGE_VALID meant that the page would be unmapped by the elf image. However, since we reserved the entire image space via mmap, PAGE_VALID will always be set. Instead, assume PROT_NONE for the same condition. Furthermore, assume bss is only ever present for writable segments, and that there is no page overlap between PT_LOAD segments. Instead of an assert, return false to indicate failure. Cc: qemu-stable@nongnu.org Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1854 Fixes: 2d385be6 ("linux-user: Do not adjust zero_bss for host page size") Reviewed-by:
Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by:
Richard Henderson <richard.henderson@linaro.org>
-
- Oct 17, 2023
-
-
https://gitlab.com/juan.quintela/qemuStefan Hajnoczi authored
Migration Pull request (20231017) Hi Same that yesterday one, except: - rebased to latest (clean rebase) - fixed 64 bits read on big endian host CI: https://gitlab.com/juan.quintela/qemu/-/pipelines/1039214198 Please, apply. # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEGJn/jt6/WMzuA0uC9IfvGFhy1yMFAmUuReUACgkQ9IfvGFhy # 1yO+FQ/+Nx2botbrUVJb3vLeG6f+x5xeWJjB0boOqhk7227cKmAA33Oqwx5l4UtL # oLOHA6P4ThqacpaluGOMMp44BSr/jOMDC/HUDVJtSplTD+droPiklIIGUfYScLbA # oYx6lXfSB2jMpSuSU19STbjwBRvd4bjJix3zDGwEIgXYqYt0tY0FY/nnGTmImnM1 # KDjRerf1lg4Rt0vvwg7I0onIDvh3CKX26Sj5a3wSRaLoocUe3jpsuBNH7MMqroHs # WpocBIsLiBAf/CbeLZsQlhbVeOi1R+kSAR5hDPvvJCPWHIrd2wf8+3NXjcFepb7d # M4wE2jLjCvHhzwYwSc0ir4n74jwD22IirEPQs8ONHrjLCb5VoBKYV5bqsFUHF55N # SbFvcZIzJFiOm2anEWiiqiNTLtYAdQCKtUvbyJ7Mq4ck6icIInLdX9zrm4voofYJ # 02lX/IIGlT3C3dGSz09LBoJ6E82zmQWNHmov8A90+3RYvMF9uSpxi0z40lhj6jWC # 6Q2AHxrJJ040ZboeOfJQG78BtvZ/9PQ2ORhJ3ceRDND4kSTDtfe/TSNAZ3thM33y # Sv99o+F/HaqrKnxK8eTJrvIEWxojDu3lnqJERWAm2AOxTnQ+6mgGtsCfLEdrv5D1 # xVsY2QczB1quRjaU2ml/7Cxe4Q1urTtfl82IEXGded6UL+cmF/I= # =br93 # -----END PGP SIGNATURE----- # gpg: Signature made Tue 17 Oct 2023 04:29:25 EDT # gpg: using RSA key 1899FF8EDEBF58CCEE034B82F487EF185872D723 # gpg: Good signature from "Juan Quintela <quintela@redhat.com>" [full] # gpg: aka "Juan Quintela <quintela@trasno.org>" [full] # Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723 * tag 'migration-20231017-pull-request' of https://gitlab.com/juan.quintela/qemu : (38 commits) migration/multifd: Clarify Error usage in multifd_channel_connect migration/multifd: Unify multifd_send_thread error paths migration/multifd: Remove direct "socket" references migration/ram: Merge save_zero_page functions migration/ram: Move xbzrle zero page handling into save_zero_page migration/ram: Stop passing QEMUFile around in save_zero_page migration/ram: Remove RAMState from xbzrle_cache_zero_page migration/ram: Refactor precopy ram loading code multifd: reset next_packet_len after sending pages multifd: fix counters in multifd_send_thread migration: check for rate_limit_max for RATE_LIMIT_DISABLED migration: Improve json and formatting migration/rdma: Remove all "ret" variables that are used only once migration/rdma: Declare for index variables local migration/rdma: Use i as for index instead of idx migration/rdma: Check sooner if we are in postcopy for save_page() migration/rdma: Remove qemu_ prefix from exported functions migration/rdma: Move rdma constants from qemu-file.h to rdma.h qemu-file: Remove QEMUFileHooks migration/rdma: Create rdma_control_save_page() ... Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com>
-
https://gitlab.com/marcandre.lureau/qemuStefan Hajnoczi authored
virtio-gpu rutabaga support # -----BEGIN PGP SIGNATURE----- # # iQJQBAABCAA6FiEEh6m9kz+HxgbSdvYt2ujhCXWWnOUFAmUtP5YcHG1hcmNhbmRy # ZS5sdXJlYXVAcmVkaGF0LmNvbQAKCRDa6OEJdZac5X9CD/4s1n/GZyDr9bh04V03 # otAqtq2CSyuUOviqBrqxYgraCosUD1AuX8WkDy5cCPtnKC4FxRjgVlm9s7K/yxOW # xZ78e4oVgB1F3voOq6LgtKK6BRG/BPqNzq9kuGcayCHQbSxg7zZVwa702Y18r2ZD # pjOhbZCrJTSfASL7C3e/rm7798Wk/hzSrClGR56fbRAVgQ6Lww2L97/g0nHyDsWK # DrCBrdqFtKjpLeUHmcqqS4AwdpG2SyCgqE7RehH/wOhvGTxh/JQvHbLGWK2mDC3j # Qvs8mClC5bUlyNQuUz7lZtXYpzCW6VGMWlz8bIu+ncgSt6RK1TRbdEfDJPGoS4w9 # ZCGgcTxTG/6BEO76J/VpydfTWDo1FwQCQ0Vv7EussGoRTLrFC3ZRFgDWpqCw85yi # AjPtc0C49FHBZhK0l1CoJGV4gGTDtD9jTYN0ffsd+aQesOjcsgivAWBaCOOQWUc8 # KOv9sr4kLLxcnuCnP7p/PuVRQD4eg0TmpdS8bXfnCzLSH8fCm+n76LuJEpGxEBey # 3KPJPj/1BNBgVgew+znSLD/EYM6YhdK2gF5SNrYsdR6UcFdrPED/xmdhzFBeVym/ # xbBWqicDw4HLn5YrJ4tzqXje5XUz5pmJoT5zrRMXTHiu4pjBkEXO/lOdAoFwSy8M # WNOtmSyB69uCrbyLw6xE2/YX8Q== # =5a/Z # -----END PGP SIGNATURE----- # gpg: Signature made Mon 16 Oct 2023 09:50:14 EDT # gpg: using RSA key 87A9BD933F87C606D276F62DDAE8E10975969CE5 # gpg: issuer "marcandre.lureau@redhat.com" # gpg: Good signature from "Marc-André Lureau <marcandre.lureau@redhat.com>" [full] # gpg: aka "Marc-André Lureau <marcandre.lureau@gmail.com>" [full] # Primary key fingerprint: 87A9 BD93 3F87 C606 D276 F62D DAE8 E109 7596 9CE5 * tag 'gpu-pull-request' of https://gitlab.com/marcandre.lureau/qemu : docs/system: add basic virtio-gpu documentation gfxstream + rutabaga: enable rutabaga gfxstream + rutabaga: meson support gfxstream + rutabaga: add initial support for gfxstream gfxstream + rutabaga prep: added need defintions, fields, and options virtio-gpu: blob prep virtio-gpu: hostmem virtio-gpu: CONTEXT_INIT feature virtio: Add shared memory capability Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com>
-
Fabiano Rosas authored
The function is currently called from two sites, one always gives it a NULL Error and the other always gives it a non-NULL Error. In the non-NULL case, all it does it trace the error and return. One of the callers already have tracing, add a tracepoint to the other and stop passing the error into the function. Cc: Markus Armbruster <armbru@redhat.com> Signed-off-by:
Fabiano Rosas <farosas@suse.de> Reviewed-by:
Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231012134343.23757-4-farosas@suse.de>
-
Fabiano Rosas authored
The preferred usage of the Error type is to always set both the return code and the error when a failure happens. As all code called from the send thread follows this pattern, we'll always have the return code and the error set at the same time. Aside from the convention, in this piece of code this must be the case, otherwise the if (ret != 0) would be exiting the thread without calling multifd_send_terminate_threads() which is incorrect. Unify both paths to make it clear that both are taken when there's an error. Signed-off-by:
Fabiano Rosas <farosas@suse.de> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231012134343.23757-3-farosas@suse.de>
-
Fabiano Rosas authored
We're about to enable support for other transports in multifd, so remove direct references to sockets. Signed-off-by:
Fabiano Rosas <farosas@suse.de> Reviewed-by:
Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231012134343.23757-2-farosas@suse.de>
-
Fabiano Rosas authored
We don't need to do this in two pieces. One single function makes it easier to grasp, specially since it removes the indirection on the return value handling. Reviewed-by:
Peter Xu <peterx@redhat.com> Signed-off-by:
Fabiano Rosas <farosas@suse.de> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231011184604.32364-7-farosas@suse.de>
-
Fabiano Rosas authored
It makes a bit more sense to have the zero page handling of xbzrle right where we save the zero page. Also invert the exit condition to remove one level of indentation which makes the next patch easier to grasp. Reviewed-by:
Peter Xu <peterx@redhat.com> Signed-off-by:
Fabiano Rosas <farosas@suse.de> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231011184604.32364-6-farosas@suse.de>
-
Fabiano Rosas authored
We don't need the QEMUFile when we're already passing the PageSearchStatus. Reviewed-by:
Peter Xu <peterx@redhat.com> Signed-off-by:
Fabiano Rosas <farosas@suse.de> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231011184604.32364-5-farosas@suse.de>
-
Fabiano Rosas authored
'rs' is not used in that function. It's a leftover from commit 9360447d ("ram: Use MigrationStats for statistics"). Reviewed-by:
Peter Xu <peterx@redhat.com> Signed-off-by:
Fabiano Rosas <farosas@suse.de> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231011184604.32364-4-farosas@suse.de>
-
Nikolay Borisov authored
Extract the ramblock parsing code into a routine that operates on the sequence of headers from the stream and another the parses the individual ramblock. This makes ram_load_precopy() easier to comprehend. Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by:
Peter Xu <peterx@redhat.com> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Fabiano Rosas <farosas@suse.de> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231011184604.32364-3-farosas@suse.de>
-
Elena Ufimtseva authored
Sometimes multifd sends just sync packet with no pages (normal_num is 0). In this case the old value is being preserved and being accounted for while only packet_len is being transferred. Reset it to 0 after sending and accounting for. Signed-off-by:
Elena Ufimtseva <elena.ufimtseva@oracle.com> Reviewed-by:
Fabiano Rosas <farosas@suse.de> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231011184358.97349-5-elena.ufimtseva@oracle.com>
-
Elena Ufimtseva authored
Previous commit cbec7eb7 "migration/multifd: Compute transferred bytes correctly" removed accounting for packet_len in non-rdma case, but the next_packet_size only accounts for pages, not for the header packet (normal_pages * PAGE_SIZE) that is being sent as iov[0]. The packet_len part should be added to account for the size of MultiFDPacket and the array of the offsets. Signed-off-by:
Elena Ufimtseva <elena.ufimtseva@oracle.com> Reviewed-by:
Fabiano Rosas <farosas@suse.de> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231011184358.97349-4-elena.ufimtseva@oracle.com>
-
Elena Ufimtseva authored
In migration rate limiting atomic operations are used to read the rate limit variables and transferred bytes and they are expensive. Check first if rate_limit_max is equal to RATE_LIMIT_DISABLED and return false immediately if so. Note that with this patch we will also will stop flushing by not calling qemu_fflush() from migration_transferred_bytes() if the migration rate is not exceeded. This should be fine since migration thread calls in the loop migration_update_counters from migration_rate_limit() that calls the migration_transferred_bytes() and flushes there. Signed-off-by:
Elena Ufimtseva <elena.ufimtseva@oracle.com> Reviewed-by:
Fabiano Rosas <farosas@suse.de> Reviewed-by:
Peter Xu <peterx@redhat.com> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231011184358.97349-2-elena.ufimtseva@oracle.com>
-
Juan Quintela authored
Reviewed-by:
Markus Armbruster <armbru@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231013104736.31722-2-quintela@redhat.com>
-
Juan Quintela authored
Change code that is: int ret; ... ret = foo(); if (ret[ < 0]?) { to: if (foo()[ < 0]) { Reviewed-by:
Fabiano Rosas <farosas@suse.de> Reviewed-by:
Li Zhijian <lizhijian@fujitsu.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231011203527.9061-14-quintela@redhat.com>
-
Juan Quintela authored
Declare all variables that are only used inside a for loop inside the for statement. This makes clear that they are not used outside of the for loop. Reviewed-by:
Fabiano Rosas <farosas@suse.de> Reviewed-by:
Li Zhijian <lizhijian@fujitsu.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231011203527.9061-13-quintela@redhat.com>
-
Juan Quintela authored
Once there, all the uses are local to the for, so declare the variable inside the for statement. Reviewed-by:
Fabiano Rosas <farosas@suse.de> Reviewed-by:
Li Zhijian <lizhijian@fujitsu.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231011203527.9061-12-quintela@redhat.com>
-
Juan Quintela authored
Reviewed-by:
Peter Xu <peterx@redhat.com> Reviewed-by:
Li Zhijian <lizhijian@fujitsu.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231011203527.9061-11-quintela@redhat.com>
-
Juan Quintela authored
Functions are long enough even without this. Reviewed-by:
Peter Xu <peterx@redhat.com> Reviewed-by:
Li Zhijian <lizhijian@fujitsu.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231011203527.9061-10-quintela@redhat.com>
-
Juan Quintela authored
Reviewed-by:
Peter Xu <peterx@redhat.com> Reviewed-by:
Li Zhijian <lizhijian@fujitsu.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231011203527.9061-9-quintela@redhat.com>
-
Juan Quintela authored
The only user was rdma, and its use is gone. Reviewed-by:
Peter Xu <peterx@redhat.com> Reviewed-by:
Li Zhijian <lizhijian@fujitsu.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231011203527.9061-8-quintela@redhat.com>
-
Juan Quintela authored
The only user of ram_control_save_page() and save_page() hook was rdma. Just move the function to rdma.c, rename it to rdma_control_save_page(). Reviewed-by:
Peter Xu <peterx@redhat.com> Reviewed-by:
Li Zhijian <lizhijian@fujitsu.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231011203527.9061-7-quintela@redhat.com>
-
Juan Quintela authored
There is only one flag called with: RAM_CONTROL_BLOCK_REG. Reviewed-by:
Li Zhijian <lizhijian@fujitsu.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231011203527.9061-6-quintela@redhat.com>
-
Juan Quintela authored
Instead of going through ram_control_load_hook(), call qemu_rdma_registration_handle() directly. Reviewed-by:
Li Zhijian <lizhijian@fujitsu.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231011203527.9061-5-quintela@redhat.com>
-
Juan Quintela authored
Once there: - Remove unused data parameter - unfold it in its callers - change all callers to call qemu_rdma_registration_stop() - We need to call QIO_CHANNEL_RDMA() after we check for migrate_rdma() Reviewed-by:
Li Zhijian <lizhijian@fujitsu.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231011203527.9061-4-quintela@redhat.com>
-
Juan Quintela authored
Once there: - Remove unused data parameter - unfold it in its callers. - change all callers to call qemu_rdma_registration_start() - We need to call QIO_CHANNEL_RDMA() after we check for migrate_rdma() Reviewed-by:
Li Zhijian <lizhijian@fujitsu.com> Reviewed-by:
Fabiano Rosas <farosas@suse.de> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231011203527.9061-3-quintela@redhat.com>
-
Juan Quintela authored
Helper to say if we are doing a migration over rdma. Reviewed-by:
Peter Xu <peterx@redhat.com> Reviewed-by:
Li Zhijian <lizhijian@fujitsu.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231011203527.9061-2-quintela@redhat.com>
-
Juan Quintela authored
RDMA was having trouble because migrate_multifd_flush_after_each_section() can only be true or false, but we don't want to send any flush when we are not in multifd migration. CC: Fabiano Rosas <farosas@suse.de Fixes: 294e5a40 ("multifd: Only flush once each full round of memory") Reported-by:
Li Zhijian <lizhijian@fujitsu.com> Reviewed-by:
Li Zhijian <lizhijian@fujitsu.com> Reviewed-by:
Peter Xu <peterx@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231011205548.10571-2-quintela@redhat.com>
-
Fiona Ebner authored
This is intended to be a semantic revert of commit 9b095037 ("migration: run setup callbacks out of big lock"). There have been so many changes since that commit (e.g. a new setup callback dirty_bitmap_save_setup() that also needs to be adapted now), it's easier to do the revert manually. For snapshots, the bdrv_writev_vmstate() function is used during setup (in QIOChannelBlock backing the QEMUFile), but not holding the BQL while calling it could lead to an assertion failure. To understand how, first note the following: 1. Generated coroutine wrappers for block layer functions spawn the coroutine and use AIO_WAIT_WHILE()/aio_poll() to wait for it. 2. If the host OS switches threads at an inconvenient time, it can happen that a bottom half scheduled for the main thread's AioContext is executed as part of a vCPU thread's aio_poll(). An example leading to the assertion failure is as follows: main thread: 1. A snapshot-save QMP command gets issued. 2. snapshot_save_job_bh() is scheduled. vCPU thread: 3. aio_poll() for the main thread's AioContext is called (e.g. when the guest writes to a pflash device, as part of blk_pwrite which is a generated coroutine wrapper). 4. snapshot_save_job_bh() is executed as part of aio_poll(). 3. qemu_savevm_state() is called. 4. qemu_mutex_unlock_iothread() is called. Now qemu_get_current_aio_context() returns 0x0. 5. bdrv_writev_vmstate() is executed during the usual savevm setup via qemu_fflush(). But this function is a generated coroutine wrapper, so it uses AIO_WAIT_WHILE. There, the assertion assert(qemu_get_current_aio_context() == qemu_get_aio_context()); will fail. To fix it, ensure that the BQL is held during setup. While it would only be needed for snapshots, adapting migration too avoids additional logic for conditional locking/unlocking in the setup callbacks. Writing the header could (in theory) also trigger qemu_fflush() and thus bdrv_writev_vmstate(), so the locked section also covers the qemu_savevm_state_header() call, even for migration for consistency. The section around multifd_send_sync_main() needs to be unlocked to avoid a deadlock. In particular, the multifd_save_setup() function calls socket_send_channel_create() using multifd_new_send_channel_async() as a callback and then waits for the callback to signal via the channels_ready semaphore. The connection happens via qio_task_run_in_thread(), but the callback is only executed via qio_task_thread_result() which is scheduled for the main event loop. Without unlocking the section, the main thread would never get to process the task result and the callback meaning there would be no signal via the channels_ready semaphore. The comment in ram_init_bitmaps() was introduced by 49877834 ("migration: fix incorrect memory_global_dirty_log_start outside BQL") and is removed, because it referred to the qemu_mutex_lock_iothread() call. Signed-off-by:
Fiona Ebner <f.ebner@proxmox.com> Reviewed-by:
Fabiano Rosas <farosas@suse.de> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231013105839.415989-1-f.ebner@proxmox.com>
-
Fabiano Rosas authored
Add basic tests for file-based migration. Note that we cannot use test_precopy_common because that routine expects it to be possible to run the migration live. With the file transport there is no live migration because we must wait for the source to finish writing the migration data to the file before the destination can start reading. Add a new migration function specifically to handle the file migration. Reviewed-by:
Peter Xu <peterx@redhat.com> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Fabiano Rosas <farosas@suse.de> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20230712190742.22294-7-farosas@suse.de>
-
Fabiano Rosas authored
Add a smoke test that migrates to a file and gives it to the script. It should catch the most annoying errors such as changes in the ram flags. After code has been merged it becomes way harder to figure out what is causing the script to fail, the person making the change is the most likely to know right away what the problem is. Signed-off-by:
Fabiano Rosas <farosas@suse.de> Acked-by:
Thomas Huth <thuth@redhat.com> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231009184326.15777-7-farosas@suse.de>
-
Fabiano Rosas authored
The migration code uses unsigned values for 16, 32 and 64-bit operations. Fix the script to do the same. This was causing an issue when parsing the migration stream generated on the ppc64 target because one of instance_ids was larger than the 32bit signed maximum: Traceback (most recent call last): File "/home/fabiano/kvm/qemu/build/scripts/analyze-migration.py", line 658, in <module> dump.read(dump_memory = args.memory) File "/home/fabiano/kvm/qemu/build/scripts/analyze-migration.py", line 592, in read classdesc = self.section_classes[section_key] KeyError: ('spapr_iommu', -2147483648) Signed-off-by:
Fabiano Rosas <farosas@suse.de> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231009184326.15777-6-farosas@suse.de>
-
Fabiano Rosas authored
The script is currently broken when the x-ignore-shared capability is used: Traceback (most recent call last): File "./scripts/analyze-migration.py", line 656, in <module> dump.read(dump_memory = args.memory) File "./scripts/analyze-migration.py", line 593, in read section.read() File "./scripts/analyze-migration.py", line 163, in read self.name = self.file.readstr(len = namelen) File "./scripts/analyze-migration.py", line 53, in readstr return self.readvar(len).decode('utf-8') UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 55: invalid start byte We're currently adding data to the middle of the ram section depending on the presence of the capability. As a consequence, any code loading the ram section needs to know about capabilities so it can interpret the stream. Skip the byte that's added when x-ignore-shared is used to fix the script. Signed-off-by:
Fabiano Rosas <farosas@suse.de> Reviewed-by:
Juan Quintela <quintela@redhat.com> Signed-off-by:
Juan Quintela <quintela@redhat.com> Message-ID: <20231009184326.15777-5-farosas@suse.de>
-