-
Stefan Hajnoczi authored
blk_io_plug_call() is invoked outside a blk_io_plug()/blk_io_unplug() section while opening the NVMe drive from: nvme_file_open() -> nvme_init() -> nvme_identify() -> nvme_admin_cmd_sync() -> nvme_submit_command() -> blk_io_plug_call() blk_io_plug_call() immediately invokes the given callback when the current thread is not plugged, as is the case during nvme_file_open(). Unfortunately, nvme_submit_command() calls blk_io_plug_call() with q->lock still held: ... q->sq.tail = (q->sq.tail + 1) % NVME_QUEUE_SIZE; q->need_kick++; blk_io_plug_call(nvme_unplug_fn, q); qemu_mutex_unlock(&q->lock); ^^^^^^^^^^^^^^^^^^^^^^^^^^^ nvme_unplug_fn() deadlocks trying to acquire q->lock because the lock is already acquired by the same thread. The symptom is that QEMU hangs during startup while opening the NVMe drive. Fix this by moving the blk_io_plug_call() outside q->lock. This is safe because no other thread runs code related to this queue and blk_io_plug_call()'s internal state is immune to thread safety issues since it is thread-local. Reported-by:
Lukáš Doktor <ldoktor@redhat.com>
Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com>
Tested-by:
Lukas Doktor <ldoktor@redhat.com>
Message-id: 20230712191628.252806-1-stefanha@redhat.com
Fixes: f2e59000 ("block/nvme: convert to blk_io_plug_call() API")
Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com>Stefan Hajnoczi authoredblk_io_plug_call() is invoked outside a blk_io_plug()/blk_io_unplug() section while opening the NVMe drive from: nvme_file_open() -> nvme_init() -> nvme_identify() -> nvme_admin_cmd_sync() -> nvme_submit_command() -> blk_io_plug_call() blk_io_plug_call() immediately invokes the given callback when the current thread is not plugged, as is the case during nvme_file_open(). Unfortunately, nvme_submit_command() calls blk_io_plug_call() with q->lock still held: ... q->sq.tail = (q->sq.tail + 1) % NVME_QUEUE_SIZE; q->need_kick++; blk_io_plug_call(nvme_unplug_fn, q); qemu_mutex_unlock(&q->lock); ^^^^^^^^^^^^^^^^^^^^^^^^^^^ nvme_unplug_fn() deadlocks trying to acquire q->lock because the lock is already acquired by the same thread. The symptom is that QEMU hangs during startup while opening the NVMe drive. Fix this by moving the blk_io_plug_call() outside q->lock. This is safe because no other thread runs code related to this queue and blk_io_plug_call()'s internal state is immune to thread safety issues since it is thread-local. Reported-by:
Lukáš Doktor <ldoktor@redhat.com>
Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com>
Tested-by:
Lukas Doktor <ldoktor@redhat.com>
Message-id: 20230712191628.252806-1-stefanha@redhat.com
Fixes: f2e59000 ("block/nvme: convert to blk_io_plug_call() API")
Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com>
Loading