Skip to content
Snippets Groups Projects
  • Kevin Wolf's avatar
    9ec7a59b
    coroutine: Revert to constant batch size · 9ec7a59b
    Kevin Wolf authored
    Commit 4c41c69e changed the way the coroutine pool is sized because for
    virtio-blk devices with a large queue size and heavy I/O, it was just
    too small and caused coroutines to be deleted and reallocated soon
    afterwards. The change made the size dynamic based on the number of
    queues and the queue size of virtio-blk devices.
    
    There are two important numbers here: Slightly simplified, when a
    coroutine terminates, it is generally stored in the global release pool
    up to a certain pool size, and if the pool is full, it is freed.
    Conversely, when allocating a new coroutine, the coroutines in the
    release pool are reused if the pool already has reached a certain
    minimum size (the batch size), otherwise we allocate new coroutines.
    
    The problem after commit 4c41c69e is that it not only increases the
    maximum pool size (which is the intended effect), but also the batch
    size for reusing coroutines (which is a bug). It means that in cases
    with many devices and/or a large queue size (which defaults to the
    number of vcpus for virtio-blk-pci), many thousand coroutines could be
    sitting in the release pool without being reused.
    
    This is not only a waste of memory and allocations, but it actually
    makes the QEMU process likely to hit the vm.max_map_count limit on Linux
    because each coroutine requires two mappings (its stack and the guard
    page for the stack), causing it to abort() in qemu_alloc_stack() because
    when the limit is hit, mprotect() starts to fail with ENOMEM.
    
    In order to fix the problem, change the batch size back to 64 to avoid
    uselessly accumulating coroutines in the release pool, but keep the
    dynamic maximum pool size so that coroutines aren't freed too early
    in heavy I/O scenarios.
    
    Note that this fix doesn't strictly make it impossible to hit the limit,
    but this would only happen if most of the coroutines are actually in use
    at the same time, not just sitting in a pool. This is the same behaviour
    as we already had before commit 4c41c69e. Fully preventing this would
    require allowing qemu_coroutine_create() to return an error, but it
    doesn't seem to be a scenario that people hit in practice.
    
    Cc: qemu-stable@nongnu.org
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2079938
    
    
    Fixes: 4c41c69e
    Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
    Message-Id: <20220510151020.105528-3-kwolf@redhat.com>
    Tested-by: default avatarHiroki Narukawa <hnarukaw@yahoo-corp.jp>
    Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
    9ec7a59b
    History
    coroutine: Revert to constant batch size
    Kevin Wolf authored
    Commit 4c41c69e changed the way the coroutine pool is sized because for
    virtio-blk devices with a large queue size and heavy I/O, it was just
    too small and caused coroutines to be deleted and reallocated soon
    afterwards. The change made the size dynamic based on the number of
    queues and the queue size of virtio-blk devices.
    
    There are two important numbers here: Slightly simplified, when a
    coroutine terminates, it is generally stored in the global release pool
    up to a certain pool size, and if the pool is full, it is freed.
    Conversely, when allocating a new coroutine, the coroutines in the
    release pool are reused if the pool already has reached a certain
    minimum size (the batch size), otherwise we allocate new coroutines.
    
    The problem after commit 4c41c69e is that it not only increases the
    maximum pool size (which is the intended effect), but also the batch
    size for reusing coroutines (which is a bug). It means that in cases
    with many devices and/or a large queue size (which defaults to the
    number of vcpus for virtio-blk-pci), many thousand coroutines could be
    sitting in the release pool without being reused.
    
    This is not only a waste of memory and allocations, but it actually
    makes the QEMU process likely to hit the vm.max_map_count limit on Linux
    because each coroutine requires two mappings (its stack and the guard
    page for the stack), causing it to abort() in qemu_alloc_stack() because
    when the limit is hit, mprotect() starts to fail with ENOMEM.
    
    In order to fix the problem, change the batch size back to 64 to avoid
    uselessly accumulating coroutines in the release pool, but keep the
    dynamic maximum pool size so that coroutines aren't freed too early
    in heavy I/O scenarios.
    
    Note that this fix doesn't strictly make it impossible to hit the limit,
    but this would only happen if most of the coroutines are actually in use
    at the same time, not just sitting in a pool. This is the same behaviour
    as we already had before commit 4c41c69e. Fully preventing this would
    require allowing qemu_coroutine_create() to return an error, but it
    doesn't seem to be a scenario that people hit in practice.
    
    Cc: qemu-stable@nongnu.org
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2079938
    
    
    Fixes: 4c41c69e
    Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
    Message-Id: <20220510151020.105528-3-kwolf@redhat.com>
    Tested-by: default avatarHiroki Narukawa <hnarukaw@yahoo-corp.jp>
    Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>