Skip to content
  • Stefan Hajnoczi's avatar
    fa9185fc
    block: change reqs_lock to QemuMutex · fa9185fc
    Stefan Hajnoczi authored
    CoMutex has poor performance when lock contention is high. The tracked
    requests list is accessed frequently and performance suffers in QEMU
    multi-queue block layer scenarios.
    
    It is not necessary to use CoMutex for the requests lock. The lock is
    always released across coroutine yield operations. It is held for
    relatively short periods of time and it is not beneficial to yield when
    the lock is held by another coroutine.
    
    Change the lock type from CoMutex to QemuMutex to improve multi-queue
    block layer performance. fio randread bs=4k iodepth=64 with 4 IOThreads
    handling a virtio-blk device with 8 virtqueues improves from 254k to
    517k IOPS (+203%). Full benchmark results and configuration details are
    available here:
    https://gitlab.com/stefanha/virt-playbooks/-/commit/980c40845d540e3669add1528739503c2e817b57
    
    
    
    In the future we may wish to introduce thread-local tracked requests
    lists to avoid lock contention completely. That would be much more
    involved though.
    
    Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
    Message-ID: <20230808155852.2745350-3-stefanha@redhat.com>
    Reviewed-by: default avatarEric Blake <eblake@redhat.com>
    Reviewed-by: default avatarKevin Wolf <kwolf@redhat.com>
    Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
    fa9185fc
    block: change reqs_lock to QemuMutex
    Stefan Hajnoczi authored
    CoMutex has poor performance when lock contention is high. The tracked
    requests list is accessed frequently and performance suffers in QEMU
    multi-queue block layer scenarios.
    
    It is not necessary to use CoMutex for the requests lock. The lock is
    always released across coroutine yield operations. It is held for
    relatively short periods of time and it is not beneficial to yield when
    the lock is held by another coroutine.
    
    Change the lock type from CoMutex to QemuMutex to improve multi-queue
    block layer performance. fio randread bs=4k iodepth=64 with 4 IOThreads
    handling a virtio-blk device with 8 virtqueues improves from 254k to
    517k IOPS (+203%). Full benchmark results and configuration details are
    available here:
    https://gitlab.com/stefanha/virt-playbooks/-/commit/980c40845d540e3669add1528739503c2e817b57
    
    
    
    In the future we may wish to introduce thread-local tracked requests
    lists to avoid lock contention completely. That would be much more
    involved though.
    
    Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
    Message-ID: <20230808155852.2745350-3-stefanha@redhat.com>
    Reviewed-by: default avatarEric Blake <eblake@redhat.com>
    Reviewed-by: default avatarKevin Wolf <kwolf@redhat.com>
    Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
Loading