Skip to content
  • Fiona Ebner's avatar
    930e239d
    migration: hold the BQL during setup · 930e239d
    Fiona Ebner authored
    
    
    This is intended to be a semantic revert of commit 9b095037
    ("migration: run setup callbacks out of big lock"). There have been so
    many changes since that commit (e.g. a new setup callback
    dirty_bitmap_save_setup() that also needs to be adapted now), it's
    easier to do the revert manually.
    
    For snapshots, the bdrv_writev_vmstate() function is used during setup
    (in QIOChannelBlock backing the QEMUFile), but not holding the BQL
    while calling it could lead to an assertion failure. To understand
    how, first note the following:
    
    1. Generated coroutine wrappers for block layer functions spawn the
    coroutine and use AIO_WAIT_WHILE()/aio_poll() to wait for it.
    2. If the host OS switches threads at an inconvenient time, it can
    happen that a bottom half scheduled for the main thread's AioContext
    is executed as part of a vCPU thread's aio_poll().
    
    An example leading to the assertion failure is as follows:
    
    main thread:
    1. A snapshot-save QMP command gets issued.
    2. snapshot_save_job_bh() is scheduled.
    
    vCPU thread:
    3. aio_poll() for the main thread's AioContext is called (e.g. when
    the guest writes to a pflash device, as part of blk_pwrite which is a
    generated coroutine wrapper).
    4. snapshot_save_job_bh() is executed as part of aio_poll().
    3. qemu_savevm_state() is called.
    4. qemu_mutex_unlock_iothread() is called. Now
    qemu_get_current_aio_context() returns 0x0.
    5. bdrv_writev_vmstate() is executed during the usual savevm setup
    via qemu_fflush(). But this function is a generated coroutine wrapper,
    so it uses AIO_WAIT_WHILE. There, the assertion
    assert(qemu_get_current_aio_context() == qemu_get_aio_context());
    will fail.
    
    To fix it, ensure that the BQL is held during setup. While it would
    only be needed for snapshots, adapting migration too avoids additional
    logic for conditional locking/unlocking in the setup callbacks.
    Writing the header could (in theory) also trigger qemu_fflush() and
    thus bdrv_writev_vmstate(), so the locked section also covers the
    qemu_savevm_state_header() call, even for migration for consistency.
    
    The section around multifd_send_sync_main() needs to be unlocked to
    avoid a deadlock. In particular, the multifd_save_setup() function calls
    socket_send_channel_create() using multifd_new_send_channel_async() as a
    callback and then waits for the callback to signal via the
    channels_ready semaphore. The connection happens via
    qio_task_run_in_thread(), but the callback is only executed via
    qio_task_thread_result() which is scheduled for the main event loop.
    Without unlocking the section, the main thread would never get to
    process the task result and the callback meaning there would be no
    signal via the channels_ready semaphore.
    
    The comment in ram_init_bitmaps() was introduced by 49877834
    ("migration: fix incorrect memory_global_dirty_log_start outside BQL")
    and is removed, because it referred to the qemu_mutex_lock_iothread()
    call.
    
    Signed-off-by: default avatarFiona Ebner <f.ebner@proxmox.com>
    Reviewed-by: default avatarFabiano Rosas <farosas@suse.de>
    Reviewed-by: default avatarJuan Quintela <quintela@redhat.com>
    Signed-off-by: default avatarJuan Quintela <quintela@redhat.com>
    Message-ID: <20231013105839.415989-1-f.ebner@proxmox.com>
    930e239d
    migration: hold the BQL during setup
    Fiona Ebner authored
    
    
    This is intended to be a semantic revert of commit 9b095037
    ("migration: run setup callbacks out of big lock"). There have been so
    many changes since that commit (e.g. a new setup callback
    dirty_bitmap_save_setup() that also needs to be adapted now), it's
    easier to do the revert manually.
    
    For snapshots, the bdrv_writev_vmstate() function is used during setup
    (in QIOChannelBlock backing the QEMUFile), but not holding the BQL
    while calling it could lead to an assertion failure. To understand
    how, first note the following:
    
    1. Generated coroutine wrappers for block layer functions spawn the
    coroutine and use AIO_WAIT_WHILE()/aio_poll() to wait for it.
    2. If the host OS switches threads at an inconvenient time, it can
    happen that a bottom half scheduled for the main thread's AioContext
    is executed as part of a vCPU thread's aio_poll().
    
    An example leading to the assertion failure is as follows:
    
    main thread:
    1. A snapshot-save QMP command gets issued.
    2. snapshot_save_job_bh() is scheduled.
    
    vCPU thread:
    3. aio_poll() for the main thread's AioContext is called (e.g. when
    the guest writes to a pflash device, as part of blk_pwrite which is a
    generated coroutine wrapper).
    4. snapshot_save_job_bh() is executed as part of aio_poll().
    3. qemu_savevm_state() is called.
    4. qemu_mutex_unlock_iothread() is called. Now
    qemu_get_current_aio_context() returns 0x0.
    5. bdrv_writev_vmstate() is executed during the usual savevm setup
    via qemu_fflush(). But this function is a generated coroutine wrapper,
    so it uses AIO_WAIT_WHILE. There, the assertion
    assert(qemu_get_current_aio_context() == qemu_get_aio_context());
    will fail.
    
    To fix it, ensure that the BQL is held during setup. While it would
    only be needed for snapshots, adapting migration too avoids additional
    logic for conditional locking/unlocking in the setup callbacks.
    Writing the header could (in theory) also trigger qemu_fflush() and
    thus bdrv_writev_vmstate(), so the locked section also covers the
    qemu_savevm_state_header() call, even for migration for consistency.
    
    The section around multifd_send_sync_main() needs to be unlocked to
    avoid a deadlock. In particular, the multifd_save_setup() function calls
    socket_send_channel_create() using multifd_new_send_channel_async() as a
    callback and then waits for the callback to signal via the
    channels_ready semaphore. The connection happens via
    qio_task_run_in_thread(), but the callback is only executed via
    qio_task_thread_result() which is scheduled for the main event loop.
    Without unlocking the section, the main thread would never get to
    process the task result and the callback meaning there would be no
    signal via the channels_ready semaphore.
    
    The comment in ram_init_bitmaps() was introduced by 49877834
    ("migration: fix incorrect memory_global_dirty_log_start outside BQL")
    and is removed, because it referred to the qemu_mutex_lock_iothread()
    call.
    
    Signed-off-by: default avatarFiona Ebner <f.ebner@proxmox.com>
    Reviewed-by: default avatarFabiano Rosas <farosas@suse.de>
    Reviewed-by: default avatarJuan Quintela <quintela@redhat.com>
    Signed-off-by: default avatarJuan Quintela <quintela@redhat.com>
    Message-ID: <20231013105839.415989-1-f.ebner@proxmox.com>
Loading