Skip to content
  • Peter Xu's avatar
    cf02f29e
    migration: Fix race that dest preempt thread close too early · cf02f29e
    Peter Xu authored
    We hit intermit CI issue on failing at migration-test over the unit test
    preempt/plain:
    
    qemu-system-x86_64: Unable to read from socket: Connection reset by peer
    Memory content inconsistency at 5b43000 first_byte = bd last_byte = bc current = 4f hit_edge = 1
    **
    ERROR:../tests/qtest/migration-test.c:300:check_guests_ram: assertion failed: (bad == 0)
    (test program exited with status code -6)
    
    Fabiano debugged into it and found that the preempt thread can quit even
    without receiving all the pages, which can cause guest not receiving all
    the pages and corrupt the guest memory.
    
    To make sure preempt thread finished receiving all the pages, we can rely
    on the page_requested_count being zero because preempt channel will only
    receive requested page faults. Note, not all the faulted pages are required
    to be sent via the preempt channel/thread; imagine the case when a
    requested page is just queued into the background main channel for
    migration, the src qemu will just still send it via the background channel.
    
    Here instead of spinning over reading the count, we add a condvar so the
    main thread can wait on it if that unusual case happened, without burning
    the cpu for no good reason, even if the duration is short; so even if we
    spin in this rare case is probably fine.  It's just better to not do so.
    
    The condvar is only used when that special case is triggered.  Some memory
    ordering trick is needed to guarantee it from happening (against the
    preempt thread status field), so the main thread will always get a kick
    when that triggers correctly.
    
    Closes: https://gitlab.com/qemu-project/qemu/-/issues/1886
    
    
    Debugged-by: default avatarFabiano Rosas <farosas@suse.de>
    Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
    Signed-off-by: default avatarFabiano Rosas <farosas@suse.de>
    Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
    Message-ID: <20230918172822.19052-2-farosas@suse.de>
    cf02f29e
    migration: Fix race that dest preempt thread close too early
    Peter Xu authored
    We hit intermit CI issue on failing at migration-test over the unit test
    preempt/plain:
    
    qemu-system-x86_64: Unable to read from socket: Connection reset by peer
    Memory content inconsistency at 5b43000 first_byte = bd last_byte = bc current = 4f hit_edge = 1
    **
    ERROR:../tests/qtest/migration-test.c:300:check_guests_ram: assertion failed: (bad == 0)
    (test program exited with status code -6)
    
    Fabiano debugged into it and found that the preempt thread can quit even
    without receiving all the pages, which can cause guest not receiving all
    the pages and corrupt the guest memory.
    
    To make sure preempt thread finished receiving all the pages, we can rely
    on the page_requested_count being zero because preempt channel will only
    receive requested page faults. Note, not all the faulted pages are required
    to be sent via the preempt channel/thread; imagine the case when a
    requested page is just queued into the background main channel for
    migration, the src qemu will just still send it via the background channel.
    
    Here instead of spinning over reading the count, we add a condvar so the
    main thread can wait on it if that unusual case happened, without burning
    the cpu for no good reason, even if the duration is short; so even if we
    spin in this rare case is probably fine.  It's just better to not do so.
    
    The condvar is only used when that special case is triggered.  Some memory
    ordering trick is needed to guarantee it from happening (against the
    preempt thread status field), so the main thread will always get a kick
    when that triggers correctly.
    
    Closes: https://gitlab.com/qemu-project/qemu/-/issues/1886
    
    
    Debugged-by: default avatarFabiano Rosas <farosas@suse.de>
    Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
    Signed-off-by: default avatarFabiano Rosas <farosas@suse.de>
    Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
    Message-ID: <20230918172822.19052-2-farosas@suse.de>
Loading