Commits · 3bb5051009a3d740c3978abd5dbafc9f7fff4e45 · Anton / libtcg

Sep 08, 2023


Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Eric Blake <eblake@redhat.com>

0a19d879

io: follow coroutine AioContext in qio_channel_yield() · 06e0f098

Stefan Hajnoczi authored 1 year ago


The ongoing QEMU multi-queue block layer effort makes it possible for multiple
threads to process I/O in parallel. The nbd block driver is not compatible with
the multi-queue block layer yet because QIOChannel cannot be used easily from
coroutines running in multiple threads. This series changes the QIOChannel API
to make that possible.

In the current API, calling qio_channel_attach_aio_context() sets the
AioContext where qio_channel_yield() installs an fd handler prior to yielding:

  qio_channel_attach_aio_context(ioc, my_ctx);
  ...
  qio_channel_yield(ioc); // my_ctx is used here
  ...
  qio_channel_detach_aio_context(ioc);

This API design has limitations: reading and writing must be done in the same
AioContext and moving between AioContexts involves a cumbersome sequence of API
calls that is not suitable for doing on a per-request basis.

There is no fundamental reason why a QIOChannel needs to run within the
same AioContext every time qio_channel_yield() is called. QIOChannel
only uses the AioContext while inside qio_channel_yield(). The rest of
the time, QIOChannel is independent of any AioContext.

In the new API, qio_channel_yield() queries the AioContext from the current
coroutine using qemu_coroutine_get_aio_context(). There is no need to
explicitly attach/detach AioContexts anymore and
qio_channel_attach_aio_context() and qio_channel_detach_aio_context() are gone.
One coroutine can read from the QIOChannel while another coroutine writes from
a different AioContext.

This API change allows the nbd block driver to use QIOChannel from any thread.
It's important to keep in mind that the block driver already synchronizes
QIOChannel access and ensures that two coroutines never read simultaneously or
write simultaneously.

This patch updates all users of qio_channel_attach_aio_context() to the
new API. Most conversions are simple, but vhost-user-server requires a
new qemu_coroutine_yield() call to quiesce the vu_client_trip()
coroutine when not attached to any AioContext.

While the API is has become simpler, there is one wart: QIOChannel has a
special case for the iohandler AioContext (used for handlers that must not run
in nested event loops). I didn't find an elegant way preserve that behavior, so
I added a new API called qio_channel_set_follow_coroutine_ctx(ioc, true|false)
for opting in to the new AioContext model. By default QIOChannel uses the
iohandler AioHandler. Code that formerly called
qio_channel_attach_aio_context() now calls
qio_channel_set_follow_coroutine_ctx(ioc, true) once after the QIOChannel is
created.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20230830224802.493686-5-stefanha@redhat.com>
[eblake: also fix migration/rdma.c]
Signed-off-by: Eric Blake <eblake@redhat.com>

06e0f098

nbd: drop unused nbd_start_negotiate() aio_context argument · 078c8ada

Stefan Hajnoczi authored 1 year ago


aio_context is always NULL, so drop it.

Suggested-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20230830224802.493686-3-stefanha@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

078c8ada

nbd: drop unused nbd_receive_negotiate() aio_context argument · b84ca91c

Stefan Hajnoczi authored 1 year ago


aio_context is always NULL, so drop it.

Suggested-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20230830224802.493686-2-stefanha@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

b84ca91c

Jul 19, 2023

nbd: Use enum for various negotiation modes · bfe04d0a

Eric Blake authored 1 year ago

Deciphering the hard-coded list of integer return values from
nbd_start_negotiate() will only get more confusing when adding support
for 64-bit extended headers. Better is to name things in an enum.
Although the function in question is private to client.c, putting the
enum in a public header and including an enum-to-string conversion
will allow its use in more places in upcoming patches.

The enum is intentionally laid out so that operators like <= can be
used to group multiple modes with similar characteristics, and where
the least powerful mode has value 0, even though this patch does not
exploit that. No semantic change intended.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-9-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

bfe04d0a

nbd/client: Add safety check on chunk payload length · 70fa99f4

Eric Blake authored 1 year ago


Our existing use of structured replies either reads into a qiov capped
at 32M (NBD_CMD_READ) or caps allocation to 1000 bytes (see
NBD_MAX_MALLOC_PAYLOAD in block/nbd.c).  But the existing length
checks are rather late; if we encounter a buggy (or malicious) server
that sends a super-large payload length, we should drop the connection
right then rather than assuming the layer on top will be careful.
This becomes more important when we permit 64-bit lengths which are
even more likely to have the potential for attempted denial of service
abuse.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20230608135653.2918540-8-eblake@redhat.com>

70fa99f4

nbd: s/handle/cookie/ to match NBD spec · 22efd811

Eric Blake authored 1 year ago

Externally, libnbd exposed the 64-bit opaque marker for each client
NBD packet as the "cookie", because it was less confusing when
contrasted with 'struct nbd_handle *' holding all libnbd state.  It
also avoids confusion between the noun 'handle' as a way to identify a
packet and the verb 'handle' for reacting to things like signals.
Upstream NBD changed their spec to favor the name "cookie" based on
libnbd's recommendations[1], so we can do likewise.

[1] https://github.com/NetworkBlockDevice/nbd/commit/ca4392eb2b



Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-6-eblake@redhat.com>
[eblake: typo fix]
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

22efd811

nbd/server: Refactor to pass full request around · 66d4f4fe

Eric Blake authored 1 year ago

Part of NBD's 64-bit headers extension involves passing the client's
requested offset back as part of the reply header (one reason it
stated for this change: converting absolute offsets stored in
NBD_REPLY_TYPE_OFFSET_DATA to relative offsets within the buffer is
easier if the absolute offset of the buffer is also available). This
is a refactoring patch to pass the full request around the reply
stack, rather than just the handle, so that later patches can then
access request->from when extended headers are active. Meanwhile,
this patch enables us to now assert that simple replies are only
attempted when appropriate, and otherwise has no semantic change.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20230608135653.2918540-5-eblake@redhat.com>

66d4f4fe

nbd/server: Prepare for alternate-size headers · a7c8ed36

Eric Blake authored 1 year ago

Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests.  As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2].  Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.

Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions.  Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.

Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.

[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md


as of NBD commit e6f3b94a934

[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction.  But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

a7c8ed36

nbd/client: Use smarter assert · f47b6eab

Eric Blake authored 1 year ago


Assigning strlen() to a uint32_t and then asserting that it isn't too
large doesn't catch the case of an input string 4G in length.
Thankfully, the incoming strings can never be that large: if the
export name or query is reflecting a string the client got from the
server, we already guarantee that we dropped the NBD connection if the
server sent more than 32M in a single reply to our NBD_OPT_* request;
if the export name is coming from qemu, nbd_receive_negotiate()
asserted that strlen(info->name) <= NBD_MAX_STRING_SIZE; and
similarly, a query string via x->dirty_bitmap coming from the user was
bounds-checked in either qemu-nbd or by the limitations of QMP.
Still, it doesn't hurt to be more explicit in how we write our
assertions to not have to analyze whether inadvertent wraparound is
possible.

Fixes: 93676c88 ("nbd: Don't send oversize strings", v4.2.0)
Reported-by: Dr. David Alan Gilbert <dave@treblig.org>
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20230608135653.2918540-2-eblake@redhat.com>

f47b6eab

Jun 05, 2023

bulk: Remove pointless QOM casts · 7d5b0d68

Philippe Mathieu-Daudé authored 1 year ago


Mechanical change running Coccinelle spatch with content
generated from the qom-cast-macro-clean-cocci-gen.py added
in the previous commit.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230601093452.38972-3-philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>

7d5b0d68

May 19, 2023

nbd/server: Fix drained_poll to wake coroutine in right AioContext · 7c1f51bf

Kevin Wolf authored 1 year ago


nbd_drained_poll() generally runs in the main thread, not whatever
iothread the NBD server coroutine is meant to run in, so it can't
directly reenter the coroutines to wake them up.

The code seems to have the right intention, it specifies the correct
AioContext when it calls qemu_aio_coroutine_enter(). However, this
functions doesn't schedule the coroutine to run in that AioContext, but
it assumes it is already called in the home thread of the AioContext.

To fix this, add a new thread-safe qio_channel_wake_read() that can be
called in the main thread to wake up the coroutine in its AioContext,
and use this in nbd_drained_poll().

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230517152834.277483-3-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

7c1f51bf

Apr 25, 2023

nbd: mark more coroutine_fns, do not use co_wrappers · d2223cdd

Paolo Bonzini authored 2 years ago


Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

d2223cdd

Apr 20, 2023

nbd: a BlockExport always has a BlockBackend · dd5b6780

Paolo Bonzini authored 2 years ago


exp->common.blk cannot be NULL, nbd_export_delete() is only called (through
a bottom half) from blk_exp_unref() and in turn that can only happen
after blk_exp_add() has asserted exp->blk != NULL.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

dd5b6780

Apr 04, 2023

nbd/server: Request TCP_NODELAY · f1426881

Eric Blake authored 1 year ago

Nagle's algorithm adds latency in order to reduce network packet
overhead on small packets.  But when we are already using corking to
merge smaller packets into transactional requests, the extra delay
from TCP defaults just gets in the way (see recent commit bd2cd4a4).

For reference, qemu as an NBD client already requests TCP_NODELAY (see
nbd_connect() in nbd/client-connection.c); as does libnbd as a client
[1], and nbdkit as a server [2].  Furthermore, the NBD spec recommends
the use of TCP_NODELAY [3].

[1] https://gitlab.com/nbdkit/libnbd/-/blob/a48a1142/generator/states-connect.c#L39
[2] https://gitlab.com/nbdkit/nbdkit/-/blob/45b72f5b/server/sockets.c#L430
[3] https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md#protocol-phases



CC: Florian Westphal <fw@strlen.de>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20230404004047.142086-1-eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

f1426881

Mar 27, 2023

nbd/server: push pending frames after sending reply · bd2cd4a4

Florian Westphal authored 2 years ago


qemu-nbd doesn't set TCP_NODELAY on the tcp socket.

Kernel waits for more data and avoids transmission of small packets.
Without TLS this is barely noticeable, but with TLS this really shows.

Booting a VM via qemu-nbd on localhost (with tls) takes more than
2 minutes on my system.  tcpdump shows frequent wait periods, where no
packets get sent for a 40ms period.

Add explicit (un)corking when processing (and responding to) requests.
"TCP_CORK, &zero" after earlier "CORK, &one" will flush pending data.

VM Boot time:
main:    no tls:  23s, with tls: 2m45s
patched: no tls:  14s, with tls: 15s

VM Boot time, qemu-nbd via network (same lan):
main:    no tls:  18s, with tls: 1m50s
patched: no tls:  17s, with tls: 18s

Future optimization: if we could detect if there is another pending
request we could defer the uncork operation because more data would be
appended.

Signed-off-by: Florian Westphal <fw@strlen.de>
Message-Id: <20230324104720.2498-1-fw@strlen.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

bd2cd4a4

Jan 20, 2023

include/block: Untangle inclusion loops · e2c1c34f

Markus Armbruster authored 2 years ago


We have two inclusion loops:

       block/block.h
    -> block/block-global-state.h
    -> block/block-common.h
    -> block/blockjob.h
    -> block/block.h

       block/block.h
    -> block/block-io.h
    -> block/block-common.h
    -> block/blockjob.h
    -> block/block.h

I believe these go back to Emanuele's reorganization of the block API,
merged a few months ago in commit d7e2fe4a.

Fortunately, breaking them is merely a matter of deleting unnecessary
includes from headers, and adding them back in places where they are
now missing.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221221133551.3967339-2-armbru@redhat.com>

e2c1c34f

coroutine: Split qemu/coroutine-core.h off qemu/coroutine.h · 68ba85ce

Markus Armbruster authored 2 years ago


qemu/coroutine.h and qemu/lockable.h include each other.

They need each other only in macro expansions, so we could simply drop
both inclusions to break the loop, and add suitable includes to files
that expand the macros.

Instead, move a part of qemu/coroutine.h to new qemu/coroutine-core.h
so that qemu/coroutine-core.h doesn't need qemu/lockable.h, and
qemu/lockable.h only needs qemu/coroutine-core.h.  Result:
qemu/coroutine.h includes qemu/lockable.h includes
qemu/coroutine-core.h.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221221131435.3851212-5-armbru@redhat.com>
[Semantic rebase conflict with 7c10cb38 "accel/tcg: Add debuginfo
support" resolved]

68ba85ce

Jan 19, 2023

coroutine: Clean up superfluous inclusion of qemu/coroutine.h · 23792478

Markus Armbruster authored 2 years ago


Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20221221131435.3851212-2-armbru@redhat.com>

23792478

Dec 15, 2022

block-backend: replace bdrv_*_above with blk_*_above · ff7e261b

Emanuele Giuseppe Esposito authored 2 years ago


Avoid mixing bdrv_* functions with blk_*, so create blk_* counterparts
for bdrv_block_status_above and bdrv_is_allocated_above.

Note that since blk_co_block_status_above only calls the g_c_w function
bdrv_common_block_status_above and is marked as coroutine_fn, call
directly bdrv_co_common_block_status_above() to avoid using a g_c_w.
Same applies to blk_co_is_allocated_above.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20221128142337.657646-5-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

ff7e261b

nbd/server.c: add coroutine_fn annotations · 6f58ac55

Emanuele Giuseppe Esposito authored 2 years ago


These functions end up calling bdrv_*() implemented as generated_co_wrapper
functions.
In addition, they also happen to be always called in coroutine context,
meaning all callers are coroutine_fn.
This means that the g_c_w function will enter the qemu_in_coroutine()
case and eventually suspend (or in other words call qemu_coroutine_yield()).
Therefore we can mark such functions coroutine_fn too.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20221128142337.657646-4-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

6f58ac55

Dec 13, 2022

nbd/server: Clean up abuse of BlockExportOptionsNbd member @arg · 8461b4d6

Markus Armbruster authored 2 years ago


block-export-add argument @name defaults to the value of argument
@node-name.

nbd_export_create() implements this by copying @node_name to @name.
It leaves @has_node_name false, violating the "has_node_name ==
!!node_name" invariant.  Unclean.  Falls apart when we elide
@has_node_name (next commit): then QAPI frees the same value twice,
once for @node_name and once @name.  iotest 307 duly explodes.

Goes back to commit c62d24e9 "blockdev-nbd: Boxed argument type for
nbd-server-add" (v5.0.0).  Got moved from qmp_nbd_server_add() to
nbd_export_create() (commit 56ee8626), then copied back (commit
b6076afc).  Commit 8675cbd6 "nbd: Utilize QAPI_CLONE for type
conversion" (v5.2.0) cleaned up the copy in qmp_nbd_server_add()
noting

    Second, our assignment to arg->name is fishy: the generated QAPI code
    for qapi_free_NbdServerAddOptions does not visit arg->name if
    arg->has_name is false, but if it DID visit it, we would have
    introduced a double-free situation when arg is finally freed.

Exactly.  However, the copy in nbd_export_create() remained dirty.

Clean it up.  Since the value stored in member @name is not actually
used outside this function, use a local variable instead of modifying
the QAPI object.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Cc: qemu-block@nongnu.org
Message-Id: <20221104160712.3005652-10-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

8461b4d6

Jul 12, 2022

block: Change blk_{pread,pwrite}() param order · a9262f55

Alberto Faria authored 2 years ago


Swap 'buf' and 'bytes' around for consistency with
blk_co_{pread,pwrite}(), and in preparation to implement these functions
using generated_co_wrapper.

Callers were updated using this Coccinelle script:

    @@ expression blk, offset, buf, bytes, flags; @@
    - blk_pread(blk, offset, buf, bytes, flags)
    + blk_pread(blk, offset, bytes, buf, flags)

    @@ expression blk, offset, buf, bytes, flags; @@
    - blk_pwrite(blk, offset, buf, bytes, flags)
    + blk_pwrite(blk, offset, bytes, buf, flags)

It had no effect on hw/block/nand.c, presumably due to the #if, so that
file was updated manually.

Overly-long lines were then fixed by hand.

Signed-off-by: Alberto Faria <afaria@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220705161527.1054072-4-afaria@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

a9262f55

block: Add a 'flags' param to blk_pread() · 3b35d454

Alberto Faria authored 2 years ago


For consistency with other I/O functions, and in preparation to
implement it using generated_co_wrapper.

Callers were updated using this Coccinelle script:

    @@ expression blk, offset, buf, bytes; @@
    - blk_pread(blk, offset, buf, bytes)
    + blk_pread(blk, offset, buf, bytes, 0)

It had no effect on hw/block/nand.c, presumably due to the #if, so that
file was updated manually.

Overly-long lines were then fixed by hand.

Signed-off-by: Alberto Faria <afaria@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220705161527.1054072-3-afaria@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

3b35d454

Jun 29, 2022

nbd: trace long NBD operations · 8bb100c9

Denis V. Lunev authored 2 years ago


At the moment there are 2 sources of lengthy operations if configured:
* open connection, which could retry inside and
* reconnect of already opened connection
These operations could be quite lengthy and cumbersome to catch thus
it would be quite natural to add trace points for them.

This patch is based on the original downstream work made by Vladimir.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Eric Blake <eblake@redhat.com>
CC: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Hanna Reitz <hreitz@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

8bb100c9

May 12, 2022

nbd/server: Allow MULTI_CONN for shared writable exports · 58a6fdcc

Eric Blake authored 2 years ago


According to the NBD spec, a server that advertises
NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will
not see any cache inconsistencies: when properly separated by a single
flush, actions performed by one client will be visible to another
client, regardless of which client did the flush.

We always satisfy these conditions in qemu - even when we support
multiple clients, ALL clients go through a single point of reference
into the block layer, with no local caching.  The effect of one client
is instantly visible to the next client.  Even if our backend were a
network device, we argue that any multi-path caching effects that
would cause inconsistencies in back-to-back actions not seeing the
effect of previous actions would be a bug in that backend, and not the
fault of caching in qemu.  As such, it is safe to unconditionally
advertise CAN_MULTI_CONN for any qemu NBD server situation that
supports parallel clients.

Note, however, that we don't want to advertise CAN_MULTI_CONN when we
know that a second client cannot connect (for historical reasons,
qemu-nbd defaults to a single connection while nbd-server-add and QMP
commands default to unlimited connections; but we already have
existing means to let either style of NBD server creation alter those
defaults).  This is visible by no longer advertising MULTI_CONN for
'qemu-nbd -r' without -e, as in the iotest nbd-qemu-allocation.

The harder part of this patch is setting up an iotest to demonstrate
behavior of multiple NBD clients to a single server.  It might be
possible with parallel qemu-io processes, but I found it easier to do
in python with the help of libnbd, and help from Nir and Vladimir in
writing the test.

Signed-off-by: Eric Blake <eblake@redhat.com>
Suggested-by: Nir Soffer <nsoffer@redhat.com>
Suggested-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@mail.ru>
Message-Id: <20220512004924.417153-3-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

58a6fdcc

Apr 26, 2022

qapi: nbd-export: allow select bitmaps by node/name pair · e5fb29d5

Vladimir Sementsov-Ogievskiy authored 3 years ago


Hi all! Current logic of relying on search through backing chain is not
safe neither convenient.

Sometimes it leads to necessity of extra bitmap copying. Also, we are
going to add "snapshot-access" driver, to access some snapshot state
through NBD. And this driver is not formally a filter, and of course
it's not a COW format driver. So, searching through backing chain will
not work. Instead of widening the workaround of bitmap searching, let's
extend the interface so that user can select bitmap precisely.

Note, that checking for bitmap active status is not copied to the new
API, I don't see a reason for it, user should understand the risks. And
anyway, bitmap from other node is unrelated to this export being
read-only or read-write.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
Message-Id: <20220314213226.362217-3-v.sementsov-og@mail.ru>
[eblake: Adjust S-o-b to Vladimir's new email, with permission]
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

e5fb29d5

Mar 22, 2022

Remove trailing ; after G_DEFINE_AUTO macro · e0e7fe07

Marc-André Lureau authored 3 years ago


The macro doesn't need it.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

e0e7fe07

Replace GCC_FMT_ATTR with G_GNUC_PRINTF · 9edc6313

Marc-André Lureau authored 3 years ago

One less qemu-specific macro. It also helps to make some headers/units
only depend on glib, and thus moved in standalone projects eventually.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>

9edc6313

Mar 08, 2022

nbd/server: Minor cleanups · 314b9026

Eric Blake authored 3 years ago


Spelling fixes, grammar improvements and consistent spacing, noticed
while preparing other patches in this file.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20211203231539.3900865-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

314b9026

Mar 07, 2022

block: pass desired TLS hostname through from block driver client · 046f98d0

Daniel P. Berrangé authored 3 years ago


In

  commit a71d597b
  Author: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
  Date:   Thu Jun 10 13:08:00 2021 +0300

    block/nbd: reuse nbd_co_do_establish_connection() in nbd_open()

the use of the 'hostname' field from the BDRVNBDState struct was
lost, and 'nbd_connect' just hardcoded it to match the IP socket
address. This was a harmless bug at the time since we block use
with anything other than IP sockets.

Shortly though, we want to allow the caller to override the hostname
used in the TLS certificate checks. This is to allow for TLS
when doing port forwarding or tunneling. Thus we need to reinstate
the passing along of the 'hostname'.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220304193610.3293146-3-berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

046f98d0

osdep: Move memalign-related functions to their own header · 5df022cf

Peter Maydell authored 3 years ago


Move the various memalign-related functions out of osdep.h and into
their own header, which we include only where they are used.
While we're doing this, add some brief documentation comments.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20220226180723.1706285-10-peter.maydell@linaro.org

5df022cf

Jan 28, 2022

nbd/server.c: Remove unused field · 523f5a99

Nir Soffer authored 3 years ago


NBDRequestData struct has unused QSIMPLEQ_ENTRY field. It seems that
this field exists since the first git commit and was never used.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Message-Id: <20220111194313.581486-1-nsoffer@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Fixes: d9a73806 ("qemu-nbd: introduce NBDRequest", v1.1)
Signed-off-by: Eric Blake <eblake@redhat.com>

523f5a99

Dec 21, 2021

nbd/client-connection: improve error message of cancelled attempt · 9e14491a
Vladimir Sementsov-Ogievskiy authored 3 years ago
```
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
```
9e14491a

nbd/client-connection: nbd_co_establish_connection(): return real error · 169b9a94

Vladimir Sementsov-Ogievskiy authored 3 years ago


The only caller of nbd_do_establish_connection() that uses errp is
nbd_open(). The only way to cancel this call is through open_timer
timeout. And for this case, user will be more interested in description
of last failed connect rather than in
"Connection attempt cancelled by other operation".

So, let's change behavior on cancel to return previous failure error if
available.

Do the same for non-blocking failure case. In this case we still don't
have a caller that is interested in errp. But let's be consistent.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>

169b9a94

Nov 22, 2021

nbd/server: Simplify zero and trim · e3557422

Eric Blake authored 3 years ago


Now that the block layer supports 64-bit operations (see commit
2800637a and friends, new to v6.2), we no longer have to self-fragment
requests larger than 2G, reverting the workaround added in 890cbccb
("nbd: Fix large trim/zero requests", v5.1.0).

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20211117170230.1128262-3-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

e3557422

nbd/server: Don't complain on certain client disconnects · 1644ccce

Eric Blake authored 3 years ago

When a client disconnects abruptly, but did not have any pending
requests (for example, when using nbdsh without calling h.shutdown),
we used to output the following message:

$ qemu-nbd -f raw file
$ nbdsh -u 'nbd://localhost:10809' -c 'h.trim(1,0)'
qemu-nbd: Disconnect client, due to: Failed to read request: Unexpected end-of-file before all bytes were read

Then in commit f148ae7d, we refactored nbd_receive_request() to use
nbd_read_eof(); when this returns 0, we regressed into tracing
uninitialized memory (if tracing is enabled) and reporting a
less-specific:

qemu-nbd: Disconnect client, due to: Request handling failed in intermediate state

Note that with Unix sockets, we have yet another error message,
unchanged by the 6.0 regression:

$ qemu-nbd -k /tmp/sock -f raw file
$ nbdsh -u 'nbd+unix:///?socket=/tmp/sock

' -c 'h.trim(1,0)'
qemu-nbd: Disconnect client, due to: Failed to send reply: Unable to write to socket: Broken pipe

But in all cases, the error message goes away if the client performs a
soft shutdown by using NBD_CMD_DISC, rather than a hard shutdown by
abrupt disconnect:

$ nbdsh -u 'nbd://localhost:10809' -c 'h.trim(1,0)' -c 'h.shutdown()'

This patch fixes things to avoid uninitialized memory, and in general
avoids warning about a client that does a hard shutdown when not in
the middle of a packet.  A client that aborts mid-request, or which
does not read the full server's reply, can still result in warnings,
but those are indeed much more unusual situations.

CC: qemu-stable@nongnu.org
Fixes: f148ae7d ("nbd/server: Quiesce coroutines on context switch", v6.0.0)
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: defer unrelated typo fixes to later patch]
Message-Id: <20211117170230.1128262-2-eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

1644ccce

Nov 16, 2021

nbd/server: Silence clang sanitizer warning · 76df2b8d

Eric Blake authored 3 years ago


clang's sanitizer is picky: memset(NULL, x, 0) is technically
undefined behavior, even though no sane implementation of memset()
deferences the NULL.  Caught by the nbd-qemu-allocation iotest.

The alternative to checking before each memset is to instead force an
allocation of 1 element instead of g_new0(type, 0)'s behavior of
returning NULL for a 0-length array.

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Fixes: 3b1f244c (nbd: Allow export of multiple bitmaps for one device)
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20211115223943.626416-1-eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

76df2b8d

Sep 29, 2021

block/nbd: drop connection_co · 4ddb5d2f

Vladimir Sementsov-Ogievskiy authored 3 years ago


OK, that's a big rewrite of the logic.

Pre-patch we have an always running coroutine - connection_co. It does
reply receiving and reconnecting. And it leads to a lot of difficult
and unobvious code around drained sections and context switch. We also
abuse bs->in_flight counter which is increased for connection_co and
temporary decreased in points where we want to allow drained section to
begin. One of these place is in another file: in nbd_read_eof() in
nbd/client.c.

We also cancel reconnect and requests waiting for reconnect on drained
begin which is not correct. And this patch fixes that.

Let's finally drop this always running coroutine and go another way:
do both reconnect and receiving in request coroutines.

The detailed list of changes below (in the sequence of diff hunks).

1. receiving coroutines are woken directly from nbd_channel_error, when
   we change s->state

2. nbd_co_establish_connection_cancel(): we don't have drain_begin now,
   and in nbd_teardown_connection() all requests should already be
   finished (and reconnect is done from request). So
   nbd_co_establish_connection_cancel() is called from
   nbd_cancel_in_flight() (to cancel the request that is doing
   nbd_co_establish_connection()) and from reconnect_delay_timer_cb()
   (previously we didn't need it, as reconnect delay only should cancel
   active requests not the reconnection itself). But now reconnection
   itself is done in the separate thread (we now call
   nbd_client_connection_enable_retry() in nbd_open()), and we need to
   cancel the requests that wait in nbd_co_establish_connection()
   now).

2A. We do receive headers in request coroutine. But we also should
   dispatch replies for other pending requests. So,
   nbd_connection_entry() is turned into nbd_receive_replies(), which
   does reply dispatching while it receives other request headers, and
   returns when it receives the requested header.

3. All old staff around drained sections and context switch is dropped.
   In details:
   - we don't need to move connection_co to new aio context, as we
     don't have connection_co anymore
   - we don't have a fake "request" of connection_co (extra increasing
     in_flight), so don't care with it in drain_begin/end
   - we don't stop reconnection during drained section anymore. This
     means that drain_begin may wait for a long time (up to
     reconnect_delay). But that's an improvement and more correct
     behavior see below[*]

4. In nbd_teardown_connection() we don't have to wait for
   connection_co, as it is dropped. And cleanup for s->ioc and nbd_yank
   is moved here from removed connection_co.

5. In nbd_co_do_establish_connection() we now should handle
   NBD_CLIENT_CONNECTING_NOWAIT: if new request comes when we are in
   NBD_CLIENT_CONNECTING_NOWAIT, it still should call
   nbd_co_establish_connection() (who knows, maybe the connection was
   already established by another thread in the background). But we
   shouldn't wait: if nbd_co_establish_connection() can't return new
   channel immediately the request should fail (we are in
   NBD_CLIENT_CONNECTING_NOWAIT state).

6. nbd_reconnect_attempt() is simplified: it's now easier to wait for
   other requests in the caller, so here we just assert that fact.
   Also delay time is now initialized here: we can easily detect first
   attempt and start a timer.

7. nbd_co_reconnect_loop() is dropped, we don't need it. Reconnect
   retries are fully handle by thread (nbd/client-connection.c), delay
   timer we initialize in nbd_reconnect_attempt(), we don't have to
   bother with s->drained and friends. nbd_reconnect_attempt() now
   called from nbd_co_send_request().

8. nbd_connection_entry is dropped: reconnect is now handled by
   nbd_co_send_request(), receiving reply is now handled by
   nbd_receive_replies(): all handled from request coroutines.

9. So, welcome new nbd_receive_replies() called from request coroutine,
   that receives reply header instead of nbd_connection_entry().
   Like with sending requests, only one coroutine may receive in a
   moment. So we introduce receive_mutex, which is locked around
   nbd_receive_reply(). It also protects some related fields. Still,
   full audit of thread-safety in nbd driver is a separate task.
   New function waits for a reply with specified handle being received
   and works rather simple:

   Under mutex:
     - if current handle is 0, do receive by hand. If another handle
       received - switch to other request coroutine, release mutex and
       yield. Otherwise return success
     - if current handle == requested handle, we are done
     - otherwise, release mutex and yield

10: in nbd_co_send_request() we now do nbd_reconnect_attempt() if
    needed. Also waiting in free_sema queue we now wait for one of two
    conditions:
    - connectED, in_flight < MAX_NBD_REQUESTS (so we can start new one)
    - connectING, in_flight == 0, so we can call
      nbd_reconnect_attempt()
    And this logic is protected by s->send_mutex

    Also, on failure we don't have to care of removed s->connection_co

11. nbd_co_do_receive_one_chunk(): now instead of yield() and wait for
    s->connection_co we just call new nbd_receive_replies().

12. nbd_co_receive_one_chunk(): place where s->reply.handle becomes 0,
    which means that handling of the whole reply is finished. Here we
    need to wake one of coroutines sleeping in nbd_receive_replies().
    If none are sleeping - do nothing. That's another behavior change: we
    don't have endless recv() in the idle time. It may be considered as
    a drawback. If so, it may be fixed later.

13. nbd_reply_chunk_iter_receive(): don't care about removed
    connection_co, just ping in_flight waiters.

14. Don't create connection_co, enable retry in the connection thread
    (we don't have own reconnect loop anymore)

15. We now need to add a nbd_co_establish_connection_cancel() call in
    nbd_cancel_in_flight(), to cancel the request that is doing a
    connection attempt.

[*], ok, now we don't cancel reconnect on drain begin. That's correct:
    reconnect feature leads to possibility of long-running requests (up
    to reconnect delay). Still, drain begin is not a reason to kill
    long requests. We should wait for them.

    This also means, that we can again reproduce a dead-lock, described
    in 8c517de2.
    Why we are OK with it:
    1. Now this is not absolutely-dead dead-lock: the vm is unfrozen
       after reconnect delay. Actually 8c517de2 fixed a bug in
       NBD logic, that was not described in 8c517de2 and led to
       forever dead-lock. The problem was that nobody woke the free_sema
       queue, but drain_begin can't finish until there is a request in
       free_sema queue. Now we have a reconnect delay timer that works
       well.
    2. It's not a problem of the NBD driver, but of the ide code,
       because it does drain_begin under the global mutex; the problem
       doesn't reproduce when using scsi instead of ide.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210902103805.25686-5-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar and comment tweaks]
Signed-off-by: Eric Blake <eblake@redhat.com>

4ddb5d2f

nbd/client-connection: nbd_co_establish_connection(): fix non set errp · f7ca4aad

Vladimir Sementsov-Ogievskiy authored 3 years ago


When we don't have a connection and blocking is false, we return NULL
but don't set errp. That's wrong.

We have two paths for calling nbd_co_establish_connection():

1. nbd_open() -> nbd_do_establish_connection() -> ...
  but that will never set blocking=false

2. nbd_reconnect_attempt() -> nbd_co_do_establish_connection() -> ...
  but that uses errp=NULL

So, we are safe with our wrong errp policy in
nbd_co_establish_connection(). Still let's fix it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210906190654.183421-2-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

f7ca4aad