Skip to content
  • Markus Armbruster's avatar
    f056158d
    cpus: Fix event order on resume of stopped guest · f056158d
    Markus Armbruster authored
    
    
    When resume of a stopped guest immediately runs into block device
    errors, the BLOCK_IO_ERROR event is sent before the RESUME event.
    
    Reproducer:
    
    1. Create a scratch image
       $ dd if=/dev/zero of=scratch.img bs=1M count=100
    
       Size doesn't actually matter.
    
    2. Prepare blkdebug configuration:
    
       $ cat >blkdebug.conf <<EOF
       [inject-error]
       event = "write_aio"
       errno = "5"
       EOF
    
       Note that errno 5 is EIO.
    
    3. Run a guest with an additional scratch disk, i.e. with additional
       arguments
       -drive if=none,id=scratch-drive,format=raw,werror=stop,file=blkdebug:blkdebug.conf:scratch.img
       -device virtio-blk-pci,id=scratch,drive=scratch-drive
    
       The blkdebug part makes all writes to the scratch drive fail with
       EIO.  The werror=stop pauses the guest on write errors.
    
    4. Connect to the QMP socket e.g. like this:
       $ socat UNIX:/your/qmp/socket READLINE,history=$HOME/.qmp_history,prompt='QMP> '
    
       Issue QMP command 'qmp_capabilities':
       QMP> { "execute": "qmp_capabilities" }
    
    5. Boot the guest.
    
    6. In the guest, write to the scratch disk, e.g. like this:
    
       # dd if=/dev/zero of=/dev/vdb count=1
    
       Do double-check the device specified with of= is actually the
       scratch device!
    
    7. Issue QMP command 'cont':
       QMP> { "execute": "cont" }
    
    After step 6, I get a BLOCK_IO_ERROR event followed by a STOP event.  Good.
    
    After step 7, I get BLOCK_IO_ERROR, then RESUME, then STOP.  Not so
    good; I'd expect RESUME, then BLOCK_IO_ERROR, then STOP.
    
    The funny event order confuses libvirt: virsh -r domstate DOMAIN
    --reason reports "paused (unknown)" rather than "paused (I/O error)".
    
    The culprit is vm_prepare_start().
    
        /* Ensure that a STOP/RESUME pair of events is emitted if a
         * vmstop request was pending.  The BLOCK_IO_ERROR event, for
         * example, according to documentation is always followed by
         * the STOP event.
         */
        if (runstate_is_running()) {
            qapi_event_send_stop(&error_abort);
            res = -1;
        } else {
            replay_enable_events();
            cpu_enable_ticks();
            runstate_set(RUN_STATE_RUNNING);
            vm_state_notify(1, RUN_STATE_RUNNING);
        }
    
        /* We are sending this now, but the CPUs will be resumed shortly later */
        qapi_event_send_resume(&error_abort);
        return res;
    
    When resuming a stopped guest, we take the else branch before we get
    to sending RESUME.  vm_state_notify() runs virtio_vmstate_change(),
    among other things.  This restarts I/O, triggering the BLOCK_IO_ERROR
    event.
    
    Reshuffle vm_prepare_start() to send the RESUME event earlier.
    
    Fixes RHBZ 1566153.
    
    Cc: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: default avatarMarkus Armbruster <armbru@redhat.com>
    Message-Id: <20180423084518.2426-1-armbru@redhat.com>
    Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
    f056158d
    cpus: Fix event order on resume of stopped guest
    Markus Armbruster authored
    
    
    When resume of a stopped guest immediately runs into block device
    errors, the BLOCK_IO_ERROR event is sent before the RESUME event.
    
    Reproducer:
    
    1. Create a scratch image
       $ dd if=/dev/zero of=scratch.img bs=1M count=100
    
       Size doesn't actually matter.
    
    2. Prepare blkdebug configuration:
    
       $ cat >blkdebug.conf <<EOF
       [inject-error]
       event = "write_aio"
       errno = "5"
       EOF
    
       Note that errno 5 is EIO.
    
    3. Run a guest with an additional scratch disk, i.e. with additional
       arguments
       -drive if=none,id=scratch-drive,format=raw,werror=stop,file=blkdebug:blkdebug.conf:scratch.img
       -device virtio-blk-pci,id=scratch,drive=scratch-drive
    
       The blkdebug part makes all writes to the scratch drive fail with
       EIO.  The werror=stop pauses the guest on write errors.
    
    4. Connect to the QMP socket e.g. like this:
       $ socat UNIX:/your/qmp/socket READLINE,history=$HOME/.qmp_history,prompt='QMP> '
    
       Issue QMP command 'qmp_capabilities':
       QMP> { "execute": "qmp_capabilities" }
    
    5. Boot the guest.
    
    6. In the guest, write to the scratch disk, e.g. like this:
    
       # dd if=/dev/zero of=/dev/vdb count=1
    
       Do double-check the device specified with of= is actually the
       scratch device!
    
    7. Issue QMP command 'cont':
       QMP> { "execute": "cont" }
    
    After step 6, I get a BLOCK_IO_ERROR event followed by a STOP event.  Good.
    
    After step 7, I get BLOCK_IO_ERROR, then RESUME, then STOP.  Not so
    good; I'd expect RESUME, then BLOCK_IO_ERROR, then STOP.
    
    The funny event order confuses libvirt: virsh -r domstate DOMAIN
    --reason reports "paused (unknown)" rather than "paused (I/O error)".
    
    The culprit is vm_prepare_start().
    
        /* Ensure that a STOP/RESUME pair of events is emitted if a
         * vmstop request was pending.  The BLOCK_IO_ERROR event, for
         * example, according to documentation is always followed by
         * the STOP event.
         */
        if (runstate_is_running()) {
            qapi_event_send_stop(&error_abort);
            res = -1;
        } else {
            replay_enable_events();
            cpu_enable_ticks();
            runstate_set(RUN_STATE_RUNNING);
            vm_state_notify(1, RUN_STATE_RUNNING);
        }
    
        /* We are sending this now, but the CPUs will be resumed shortly later */
        qapi_event_send_resume(&error_abort);
        return res;
    
    When resuming a stopped guest, we take the else branch before we get
    to sending RESUME.  vm_state_notify() runs virtio_vmstate_change(),
    among other things.  This restarts I/O, triggering the BLOCK_IO_ERROR
    event.
    
    Reshuffle vm_prepare_start() to send the RESUME event earlier.
    
    Fixes RHBZ 1566153.
    
    Cc: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: default avatarMarkus Armbruster <armbru@redhat.com>
    Message-Id: <20180423084518.2426-1-armbru@redhat.com>
    Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
Loading