Commits · 64ebbb7d62cfe3e435b1ce87f839398b75bbc35a · Anton / libtcg

Sep 29, 2021

block/nbd: drop connection_co · 4ddb5d2f

Vladimir Sementsov-Ogievskiy authored 3 years ago


OK, that's a big rewrite of the logic.

Pre-patch we have an always running coroutine - connection_co. It does
reply receiving and reconnecting. And it leads to a lot of difficult
and unobvious code around drained sections and context switch. We also
abuse bs->in_flight counter which is increased for connection_co and
temporary decreased in points where we want to allow drained section to
begin. One of these place is in another file: in nbd_read_eof() in
nbd/client.c.

We also cancel reconnect and requests waiting for reconnect on drained
begin which is not correct. And this patch fixes that.

Let's finally drop this always running coroutine and go another way:
do both reconnect and receiving in request coroutines.

The detailed list of changes below (in the sequence of diff hunks).

1. receiving coroutines are woken directly from nbd_channel_error, when
   we change s->state

2. nbd_co_establish_connection_cancel(): we don't have drain_begin now,
   and in nbd_teardown_connection() all requests should already be
   finished (and reconnect is done from request). So
   nbd_co_establish_connection_cancel() is called from
   nbd_cancel_in_flight() (to cancel the request that is doing
   nbd_co_establish_connection()) and from reconnect_delay_timer_cb()
   (previously we didn't need it, as reconnect delay only should cancel
   active requests not the reconnection itself). But now reconnection
   itself is done in the separate thread (we now call
   nbd_client_connection_enable_retry() in nbd_open()), and we need to
   cancel the requests that wait in nbd_co_establish_connection()
   now).

2A. We do receive headers in request coroutine. But we also should
   dispatch replies for other pending requests. So,
   nbd_connection_entry() is turned into nbd_receive_replies(), which
   does reply dispatching while it receives other request headers, and
   returns when it receives the requested header.

3. All old staff around drained sections and context switch is dropped.
   In details:
   - we don't need to move connection_co to new aio context, as we
     don't have connection_co anymore
   - we don't have a fake "request" of connection_co (extra increasing
     in_flight), so don't care with it in drain_begin/end
   - we don't stop reconnection during drained section anymore. This
     means that drain_begin may wait for a long time (up to
     reconnect_delay). But that's an improvement and more correct
     behavior see below[*]

4. In nbd_teardown_connection() we don't have to wait for
   connection_co, as it is dropped. And cleanup for s->ioc and nbd_yank
   is moved here from removed connection_co.

5. In nbd_co_do_establish_connection() we now should handle
   NBD_CLIENT_CONNECTING_NOWAIT: if new request comes when we are in
   NBD_CLIENT_CONNECTING_NOWAIT, it still should call
   nbd_co_establish_connection() (who knows, maybe the connection was
   already established by another thread in the background). But we
   shouldn't wait: if nbd_co_establish_connection() can't return new
   channel immediately the request should fail (we are in
   NBD_CLIENT_CONNECTING_NOWAIT state).

6. nbd_reconnect_attempt() is simplified: it's now easier to wait for
   other requests in the caller, so here we just assert that fact.
   Also delay time is now initialized here: we can easily detect first
   attempt and start a timer.

7. nbd_co_reconnect_loop() is dropped, we don't need it. Reconnect
   retries are fully handle by thread (nbd/client-connection.c), delay
   timer we initialize in nbd_reconnect_attempt(), we don't have to
   bother with s->drained and friends. nbd_reconnect_attempt() now
   called from nbd_co_send_request().

8. nbd_connection_entry is dropped: reconnect is now handled by
   nbd_co_send_request(), receiving reply is now handled by
   nbd_receive_replies(): all handled from request coroutines.

9. So, welcome new nbd_receive_replies() called from request coroutine,
   that receives reply header instead of nbd_connection_entry().
   Like with sending requests, only one coroutine may receive in a
   moment. So we introduce receive_mutex, which is locked around
   nbd_receive_reply(). It also protects some related fields. Still,
   full audit of thread-safety in nbd driver is a separate task.
   New function waits for a reply with specified handle being received
   and works rather simple:

   Under mutex:
     - if current handle is 0, do receive by hand. If another handle
       received - switch to other request coroutine, release mutex and
       yield. Otherwise return success
     - if current handle == requested handle, we are done
     - otherwise, release mutex and yield

10: in nbd_co_send_request() we now do nbd_reconnect_attempt() if
    needed. Also waiting in free_sema queue we now wait for one of two
    conditions:
    - connectED, in_flight < MAX_NBD_REQUESTS (so we can start new one)
    - connectING, in_flight == 0, so we can call
      nbd_reconnect_attempt()
    And this logic is protected by s->send_mutex

    Also, on failure we don't have to care of removed s->connection_co

11. nbd_co_do_receive_one_chunk(): now instead of yield() and wait for
    s->connection_co we just call new nbd_receive_replies().

12. nbd_co_receive_one_chunk(): place where s->reply.handle becomes 0,
    which means that handling of the whole reply is finished. Here we
    need to wake one of coroutines sleeping in nbd_receive_replies().
    If none are sleeping - do nothing. That's another behavior change: we
    don't have endless recv() in the idle time. It may be considered as
    a drawback. If so, it may be fixed later.

13. nbd_reply_chunk_iter_receive(): don't care about removed
    connection_co, just ping in_flight waiters.

14. Don't create connection_co, enable retry in the connection thread
    (we don't have own reconnect loop anymore)

15. We now need to add a nbd_co_establish_connection_cancel() call in
    nbd_cancel_in_flight(), to cancel the request that is doing a
    connection attempt.

[*], ok, now we don't cancel reconnect on drain begin. That's correct:
    reconnect feature leads to possibility of long-running requests (up
    to reconnect delay). Still, drain begin is not a reason to kill
    long requests. We should wait for them.

    This also means, that we can again reproduce a dead-lock, described
    in 8c517de2.
    Why we are OK with it:
    1. Now this is not absolutely-dead dead-lock: the vm is unfrozen
       after reconnect delay. Actually 8c517de2 fixed a bug in
       NBD logic, that was not described in 8c517de2 and led to
       forever dead-lock. The problem was that nobody woke the free_sema
       queue, but drain_begin can't finish until there is a request in
       free_sema queue. Now we have a reconnect delay timer that works
       well.
    2. It's not a problem of the NBD driver, but of the ide code,
       because it does drain_begin under the global mutex; the problem
       doesn't reproduce when using scsi instead of ide.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210902103805.25686-5-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar and comment tweaks]
Signed-off-by: Eric Blake <eblake@redhat.com>

4ddb5d2f

Jul 10, 2020

nbd: Use ERRP_GUARD() · 795d946d

Vladimir Sementsov-Ogievskiy authored 4 years ago


If we want to check error after errp-function call, we need to
introduce local_err and then propagate it to errp. Instead, use
the ERRP_GUARD() macro, benefits are:
1. No need of explicit error_propagate call
2. No need of explicit local_err variable: use errp directly
3. ERRP_GUARD() leaves errp as is if it's not NULL or
   &error_fatal, this means that we don't break error_abort
   (we'll abort on error_set, not on error_propagate)

If we want to add some info to errp (by error_prepend() or
error_append_hint()), we must use the ERRP_GUARD() macro.
Otherwise, this info will not be added when errp == &error_fatal
(the program will exit prior to the error_append_hint() or
error_prepend() call).  Fix several such cases, e.g. in nbd_read().

This commit is generated by command

    sed -n '/^Network Block Device (NBD)$/,/^$/{s/^F: //p}' \
        MAINTAINERS | \
    xargs git ls-files | grep '\.[hc]$' | \
    xargs spatch \
        --sp-file scripts/coccinelle/errp-guard.cocci \
        --macro-file scripts/cocci-macro-file.h \
        --in-place --no-show-diff --max-width 80

Reported-by: Kevin Wolf <kwolf@redhat.com>
Reported-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200707165037.1026246-8-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[ERRP_AUTO_PROPAGATE() renamed to ERRP_GUARD(), and
auto-propagated-errp.cocci to errp-guard.cocci.  Commit message
tweaked again.]

795d946d

Nov 18, 2019

nbd: Don't send oversize strings · 93676c88

Eric Blake authored 5 years ago


Qemu as server currently won't accept export names larger than 256
bytes, nor create dirty bitmap names longer than 1023 bytes, so most
uses of qemu as client or server have no reason to get anywhere near
the NBD spec maximum of a 4k limit per string.

However, we weren't actually enforcing things, ignoring when the
remote side violates the protocol on input, and also having several
code paths where we send oversize strings on output (for example,
qemu-nbd --description could easily send more than 4k).  Tighten
things up as follows:

client:
- Perform bounds check on export name and dirty bitmap request prior
  to handing it to server
- Validate that copied server replies are not too long (ignoring
  NBD_INFO_* replies that are not copied is not too bad)
server:
- Perform bounds check on export name and description prior to
  advertising it to client
- Reject client name or metadata query that is too long
- Adjust things to allow full 4k name limit rather than previous
  256 byte limit

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20191114024635.11363-4-eblake@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

93676c88

Sep 24, 2019

nbd/client: Add hint when TLS is missing · 1b5c15ce

Eric Blake authored 5 years ago


I received an off-list report of failure to connect to an NBD server
expecting an x509 certificate, when the client was attempting something
similar to this command line:

$ ./x86_64-softmmu/qemu-system-x86_64 -name 'blah' -machine q35 -nodefaults \
  -object tls-creds-x509,id=tls0,endpoint=client,dir=$path_to_certs \
  -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0,addr=0x6 \
  -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=raw,file=nbd:localhost:9000,werror=stop,rerror=stop,tls-creds=tls0 \
  -device scsi-hd,id=image1,drive=drive_image1,bootindex=0
qemu-system-x86_64: -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=raw,file=nbd:localhost:9000,werror=stop,rerror=stop,tls-creds=tls0: TLS negotiation required before option 7 (go)
server reported: Option 0x7 not permitted before TLS

The problem?  As specified, -drive is trying to pass tls-creds to the
raw format driver instead of the nbd protocol driver, but before we
get to the point where we can detect that raw doesn't know what to do
with tls-creds, the nbd driver has already failed because the server
complained.  The fix to the broken command line?  Pass
'...,file.tls-creds=tls0' to ensure the tls-creds option is handed to
nbd, not raw.  But since the error message was rather cryptic, I'm
trying to improve the error message.

With this patch, the error message adds a line:

qemu-system-x86_64: -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=raw,file=nbd:localhost:9000,werror=stop,rerror=stop,tls-creds=tls0: TLS negotiation required before option 7 (go)
Did you forget a valid tls-creds?
server reported: Option 0x7 not permitted before TLS

And with luck, someone grepping for that error message will find this
commit message and figure out their command line mistake.  Sadly, the
only mention of file.tls-creds in our docs relates to an --image-opts
use of PSK encryption with qemu-img as the client, rather than x509
certificate encryption with qemu-kvm as the client.

CC: Tingting Mao <timao@redhat.com>
CC: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190907172055.26870-1-eblake@redhat.com>
[eblake: squash in iotest 233 fix]
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

1b5c15ce

Sep 05, 2019

nbd: Tolerate more errors to structured reply request · 5de47735

Eric Blake authored 5 years ago


A server may have a reason to reject a request for structured replies,
beyond just not recognizing them as a valid request; similarly, it may
have a reason for rejecting a request for a meta context.  It doesn't
hurt us to continue talking to such a server; otherwise 'qemu-nbd
--list' of such a server fails to display all available details about
the export.

Encountered when temporarily tweaking nbdkit to reply with
NBD_REP_ERR_POLICY.  Present since structured reply support was first
added (commit d795299b reused starttls handling, but starttls is
different in that we can't fall back to other behavior on any error).

Note that for an unencrypted client trying to connect to a server that
requires encryption, this defers the point of failure to when we
finally execute a strict command (such as NBD_OPT_GO or NBD_OPT_LIST),
now that the intermediate NBD_OPT_STRUCTURED_REPLY does not diagnose
NBD_REP_ERR_TLS_REQD as fatal; but as the protocol eventually gets us
to a command where we can't continue onwards, the changed error
message doesn't cause any security concerns.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190824172813.29720-3-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
[eblake: fix iotest 233]

5de47735

nbd: Use g_autofree in a few places · df18c04e

Eric Blake authored 5 years ago


Thanks to our recent move to use glib's g_autofree, I can join the
bandwagon.  Getting rid of gotos is fun ;)

There are probably more places where we could register cleanup
functions and get rid of more gotos; this patch just focuses on the
labels that existed merely to call g_free.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190824172813.29720-2-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

df18c04e

Aug 16, 2019

Include qemu/queue.h slightly less · dc5e9ac7

Markus Armbruster authored 5 years ago


Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190812052359.30071-20-armbru@redhat.com>

dc5e9ac7

Aug 15, 2019

block/nbd: use non-blocking io channel for nbd negotiation · a8e2bb6a

Vladimir Sementsov-Ogievskiy authored 5 years ago


No reason to use blocking channel for negotiation and we'll benefit in
further reconnect feature, as qio_channel reads and writes will do
qemu_coroutine_yield while waiting for io completion.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190618114328.55249-3-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

a8e2bb6a

Apr 08, 2019

nbd/client: Fix error message for server with unusable sizing · e53f88df

Eric Blake authored 5 years ago


Add a missing space to the error message used when giving up on a
server that insists on an alignment which renders the last few bytes
of the export unreadable.

Fixes: 3add3ab7
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190404145226.32649-1-eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>

e53f88df

Apr 01, 2019

nbd/client: Reject inaccessible tail of inconsistent server · 3add3ab7

Eric Blake authored 6 years ago

The NBD spec suggests that a server should never advertise a size
inconsistent with its minimum block alignment, as that tail is
effectively inaccessible to a compliant client obeying those block
constraints. Since we have a habit of rounding up rather than
truncating, to avoid losing the last few bytes of user input, and we
cannot access the tail when the server advertises bogus block sizing,
abort the connection to alert the server to fix their bug. And
rejecting such servers matches what we already did for a min_block
that was not a power of 2 or which was larger than max_block.

Does not impact either qemu (which always sends properly aligned
sizes) or nbdkit (which does not send minimum block requirements yet);
so this is mostly aimed at new NBD server implementations, and ensures
that the rest of our code can assume the size is aligned.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190330155704.24191-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

3add3ab7

Feb 25, 2019

nbd: Use low-level QIOChannel API in nbd_read_eof() · d3bd5b90

Kevin Wolf authored 6 years ago


Instead of using the convenience wrapper qio_channel_read_all_eof(), use
the lower level QIOChannel API. This means duplicating some code, but
we'll need this because this coroutine yield is special: We want it to
be interruptible so that nbd_client_attach_aio_context() can correctly
reenter the coroutine.

This moves the bdrv_dec/inc_in_flight() pair into nbd_read_eof(), so
that connection_co will always sit in this exact qio_channel_yield()
call when bdrv_drain() returns.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>

d3bd5b90

nbd: Move nbd_read_eof() to nbd/client.c · a7b78fc9

Kevin Wolf authored 6 years ago


The only caller of nbd_read_eof() is nbd_receive_reply(), so it doesn't
have to live in the header file, but can move next to its caller.

Also add the missing coroutine_fn to the function and its caller.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>

a7b78fc9

Feb 04, 2019

nbd: generalize usage of nbd_read · e6798f06

Vladimir Sementsov-Ogievskiy authored 6 years ago


We generally do very similar things around nbd_read: error_prepend
specifying what we have tried to read, and be_to_cpu conversion of
integers.

So, it seems reasonable to move common things to helper functions,
which:
1. simplify code a bit
2. generalize nbd_read error descriptions, all starting with
   "Failed to read"
3. make it more difficult to forget to convert things from BE

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190128165830.165170-1-vsementsov@virtuozzo.com>
[eblake: rename macro to DEF_NBD_READ_N and formatting tweaks;
checkpatch has false positive complaint]
Signed-off-by: Eric Blake <eblake@redhat.com>

e6798f06

Jan 21, 2019

nbd/client: Work around 3.0 bug for listing meta contexts · 7c6f5ddc

Eric Blake authored 6 years ago


Commit 3d068aff forgot to advertise available qemu: contexts
when the client requests a list with 0 queries. Furthermore,
3.0 shipped with a qemu-img hack of x-dirty-bitmap (commit
216ee365) that _silently_ acts as though the entire image is
clean if a requested bitmap is not present.  Both bugs have
been recently fixed, so that a modern qemu server gives full
context output right away, and the client refuses a
connection if a requested x-dirty-bitmap was not found.

Still, it is likely that there will be users that have to
work with a mix of old and new qemu versions, depending on
which features get backported where, at which point being
able to rely on 'qemu-img --list' output to know for sure
whether a given NBD export has the desired dirty bitmap is
much nicer than blindly connecting and risking that the
entire image may appear clean.  We can make our --list code
smart enough to work around buggy servers by tracking
whether we've seen any qemu: replies in the original 0-query
list; if not, repeat with a single query on "qemu:" (which
may still have no replies, but then we know for sure we
didn't trip up on the server bug).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190117193658.16413-21-eblake@redhat.com>

7c6f5ddc

nbd/client: Add meta contexts to nbd_receive_export_list() · 0b576b6b

Eric Blake authored 6 years ago


We want to be able to detect whether a given qemu NBD server is
exposing the right export(s) and dirty bitmaps, at least for
regression testing.  We could use 'nbd-client -l' from the upstream
NBD project to list exports, but it's annoying to rely on
out-of-tree binaries; furthermore, nbd-client doesn't necessarily
know about all of the qemu NBD extensions.  Thus, we plan on adding
a new mode to qemu-nbd that merely sniffs all possible information
from the server during handshake phase, then disconnects and dumps
the information.

This patch continues the work of the previous patch, by adding the
ability to track the list of available meta contexts into
NBDExportInfo.  It benefits from the recent refactoring patches
with a new nbd_list_meta_contexts() that reuses much of the same
framework as setting a meta context.

Note: a malicious server could exhaust memory of a client by feeding
an unending loop of contexts; perhaps we could place a limit on how
many we are willing to receive. But this is no different from our
earlier analysis on a server sending an unending list of exports,
and the death of a client due to memory exhaustion when the client
was going to exit soon anyways is not really a denial of service
attack.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190117193658.16413-19-eblake@redhat.com>

0b576b6b

nbd/client: Add nbd_receive_export_list() · d21a2d34

Eric Blake authored 6 years ago


We want to be able to detect whether a given qemu NBD server is
exposing the right export(s) and dirty bitmaps, at least for
regression testing.  We could use 'nbd-client -l' from the upstream
NBD project to list exports, but it's annoying to rely on
out-of-tree binaries; furthermore, nbd-client doesn't necessarily
know about all of the qemu NBD extensions.  Thus, we plan on adding
a new mode to qemu-nbd that merely sniffs all possible information
from the server during handshake phase, then disconnects and dumps
the information.

This patch adds the low-level client code for grabbing the list
of exports.  It benefits from the recent refactoring patches, in
order to share as much code as possible when it comes to doing
validation of server replies.  The resulting information is stored
in an array of NBDExportInfo which has been expanded to any
description string, along with a convenience function for freeing
the list.

Note: a malicious server could exhaust memory of a client by feeding
an unending loop of exports; perhaps we should place a limit on how
many we are willing to receive. But note that a server could
reasonably be serving an export for every file in a large directory,
where an arbitrary limit in the client means we can't list anything
from such a server; the same happens if we just run until the client
fails to malloc() and thus dies by an abort(), where the limit is
no longer arbitrary but determined by available memory.  Since the
client is already planning on being short-lived, it's hard to call
this a denial of service attack that would starve off other uses,
so it does not appear to be a security issue.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Message-Id: <20190117193658.16413-18-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

d21a2d34

nbd/client: Refactor nbd_opt_go() to support NBD_OPT_INFO · 138796d0

Eric Blake authored 6 years ago


Rename the function to nbd_opt_info_or_go() with an added parameter
and slight changes to comments and trace messages, in order to
reuse the function for NBD_OPT_INFO.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190117193658.16413-17-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

138796d0

nbd/client: Pull out oldstyle size determination · b3c9d33b

Eric Blake authored 6 years ago


Another refactoring creating nbd_negotiate_finish_oldstyle()
for further reuse during 'qemu-nbd --list'.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Message-Id: <20190117193658.16413-16-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

b3c9d33b

nbd/client: Split handshake into two functions · 10b89988

Eric Blake authored 6 years ago


An upcoming patch will add the ability for qemu-nbd to list
the services provided by an NBD server.  Share the common
code of the TLS handshake by splitting the initial exchange
into a separate function, leaving only the export handling
in the original function.  Functionally, there should be no
change in behavior in this patch, although some of the code
motion may be difficult to follow due to indentation changes
(view with 'git diff -w' for a smaller changeset).

I considered an enum for the return code coordinating state
between the two functions, but in the end just settled with
ample comments.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190117193658.16413-15-eblake@redhat.com>

10b89988

nbd/client: Refactor return of nbd_receive_negotiate() · 2b8d0954

Eric Blake authored 6 years ago


The function could only ever return 0 or -EINVAL; make this
clearer by dropping a useless 'fail:' label.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190117193658.16413-14-eblake@redhat.com>

2b8d0954

nbd/client: Split out nbd_receive_one_meta_context() · 0182c1ae

Eric Blake authored 6 years ago


Extract portions of nbd_negotiate_simple_meta_context() to
a new function nbd_receive_one_meta_context() that copies the
pattern of nbd_receive_list() for performing the argument
validation of one reply.  The error message when the server
replies with more than one context changes slightly, but
that shouldn't happen in the common case.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190117193658.16413-13-eblake@redhat.com>

0182c1ae

nbd/client: Split out nbd_send_meta_query() · 757b3ab9

Eric Blake authored 6 years ago


Refactor nbd_negotiate_simple_meta_context() to pull out the
code that can be reused to send a LIST request for 0 or 1 query.
No semantic change.  The old comment about 'sizeof(uint32_t)'
being equivalent to '/* number of queries */' is no longer
needed, now that we are computing 'sizeof(queries)' instead.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Message-Id: <20190117193658.16413-12-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

757b3ab9

nbd/client: Change signature of nbd_negotiate_simple_meta_context() · 2df94eb5

Eric Blake authored 6 years ago


Pass 'info' instead of three separate parameters related to info,
when requesting the server to set the meta context.  Update the
NBDExportInfo struct to rename the received id field to match the
fact that we are currently overloading the field to match whatever
context the user supplied through the x-dirty-bitmap hack, as well
as adding a TODO comment to remind future patches about a desire
to request two contexts at once.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190117193658.16413-11-eblake@redhat.com>

2df94eb5

nbd/client: Move export name into NBDExportInfo · 6dc1667d

Eric Blake authored 6 years ago


Refactor the 'name' parameter of nbd_receive_negotiate() from
being a separate parameter into being part of the in-out 'info'.
This also spills over to a simplification of nbd_opt_go().

The main driver for this refactoring is that an upcoming patch
would like to add support to qemu-nbd to list information about
all exports available on a server, where the name(s) will be
provided by the server instead of the client.  But another benefit
is that we can now allow the client to explicitly specify the
empty export name "" even when connecting to an oldstyle server
(even if qemu is no longer such a server after commit 7f7dfe2a).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190117193658.16413-10-eblake@redhat.com>

6dc1667d

nbd/client: Refactor nbd_receive_list() · 091d0bf3

Eric Blake authored 6 years ago


Right now, nbd_receive_list() is only called by
nbd_receive_query_exports(), which in turn is only called if the
server lacks NBD_OPT_GO but has working option negotiation, and is
merely used as a quality-of-implementation trick since servers
can't give decent errors for NBD_OPT_EXPORT_NAME.  However, servers
that lack NBD_OPT_GO are becoming increasingly rare (nbdkit was a
latecomer, in Aug 2018, but qemu has been such a server since commit
f37708f6 in July 2017 and released in 2.10), so it no longer makes
sense to micro-optimize that function for performance.

Furthermore, when debugging a server's implementation, tracing the
full reply (both names and descriptions) is useful, not to mention
that upcoming patches adding 'qemu-nbd --list' will want to collect
that data.  And when you consider that a server can send an export
name up to the NBD protocol length limit of 4k; but our current
NBD_MAX_NAME_SIZE is only 256, we can't trace all valid server
names without more storage, but 4k is large enough that the heap
is better than the stack for long names.

Thus, I'm changing the division of labor, with nbd_receive_list()
now always malloc'ing a result on success (the malloc is bounded
by the fact that we reject servers with a reply length larger
than 32M), and moving the comparison to 'wantname' to the caller.

There is a minor change in behavior where a server with 0 exports
(an immediate NBD_REP_ACK reply) is now no longer distinguished
from a server without LIST support (NBD_REP_ERR_UNSUP); this
information could be preserved with a complication to the calling
contract to provide a bit more information, but I didn't see the
point.  After all, the worst that can happen if our guess at a
match is wrong is that the caller will get a cryptic disconnect
when NBD_OPT_EXPORT_NAME fails (which is no different from what
would happen if we had not tried LIST), while treating an empty
list as immediate failure would prevent connecting to really old
servers that really did lack LIST.  Besides, NBD servers with 0
exports are rare (qemu can do it when using QMP nbd-server-start
without nbd-server-add - but qemu understands NBD_OPT_GO and
thus won't tickle this change in behavior).

Fix the spelling of foundExport to match coding standards while
in the area.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190117193658.16413-9-eblake@redhat.com>

091d0bf3

Jan 05, 2019

nbd/client: Drop pointless buf variable · ef2e35fc

Eric Blake authored 6 years ago


There's no need to read into a temporary buffer (oversized
since commit 7d3123e1) followed by a byteswap into a uint64_t
to check for a magic number via memcmp(), when the code
immediately below demonstrates reading into the uint64_t then
byteswapping in place and checking for a magic number via
integer math.  What's more, having a different error message
when the server's first reply byte is 0 is unusual - it's no
different from any other wrong magic number, and we already
detected short reads. That whole strlen() issue has been
present and useless since commit 1d45f8b5 in 2010; perhaps it
was leftover debugging (since the correct magic number happens
to be ASCII)?  Make the error messages more consistent and
detailed while touching things.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20181215135324.152629-9-eblake@redhat.com>

ef2e35fc

qemu-nbd: Fail earlier for -c/-d on non-linux · 3c1fa35d

Eric Blake authored 6 years ago


Connecting to a /dev/nbdN device is a Linux-specific action.
We were already masking -c and -d from 'qemu-nbd --help' on
non-linux.  However, while -d fails with a sensible error
message, it took hunting through a couple of files to prove
that.  What's more, the code for -c doesn't fail until after
it has created a pthread and tried to open a device - possibly
even printing an error message with %m on a non-Linux platform
in spite of the comment that %m is glibc-specific.  Make the
failure happen sooner, then get rid of stubs that are no
longer needed because of the early exits.

While at it: tweak the blank newlines in --help output to be
consistent, whether or not built on Linux.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20181215135324.152629-7-eblake@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

3c1fa35d

nbd/client: More consistent error messages · 6c5c0351

Eric Blake authored 6 years ago


Consolidate on using decimal (not hex), on outputting the
option reply name (not just value), and a consistent comma between
clauses, when the client reports protocol discrepancies from the
server.  While it won't affect normal operation, it makes
debugging additions easier.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20181215135324.152629-6-eblake@redhat.com>

6c5c0351

Jan 04, 2019

nbd/client: Trace all server option error messages · bee21ef0

Eric Blake authored 6 years ago


Not all servers send free-form text alongside option error replies, but
for servers that do (such as qemu), we pass the server's message as a
hint alongside our own error reporting.  However, it would also be
useful to trace such server messages, since we can't guarantee how the
hint may be consumed.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20181218225714.284495-3-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

bee21ef0

Oct 03, 2018

nbd: Don't take address of fields in packed structs · 80c7c2b0

Peter Maydell authored 6 years ago


Taking the address of a field in a packed struct is a bad idea, because
it might not be actually aligned enough for that pointer type (and
thus cause a crash on dereference on some host architectures). Newer
versions of clang warn about this. Avoid the bug by not using the
"modify in place" byte swapping functions.

This patch was produced with the following spatch script:
@@
expression E;
@@
-be16_to_cpus(&E);
+E = be16_to_cpu(E);
@@
expression E;
@@
-be32_to_cpus(&E);
+E = be32_to_cpu(E);
@@
expression E;
@@
-be64_to_cpus(&E);
+E = be64_to_cpu(E);
@@
expression E;
@@
-cpu_to_be16s(&E);
+E = cpu_to_be16(E);
@@
expression E;
@@
-cpu_to_be32s(&E);
+E = cpu_to_be32(E);
@@
expression E;
@@
-cpu_to_be64s(&E);
+E = cpu_to_be64(E);

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20180927164200.15097-1-peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: rebase, and squash in missed changes]
Signed-off-by: Eric Blake <eblake@redhat.com>

80c7c2b0

Jul 02, 2018

nbd/client: Add x-dirty-bitmap to query bitmap from server · 216ee365

Eric Blake authored 6 years ago


In order to test that the NBD server is properly advertising
dirty bitmaps, we need a bare minimum client that can request
and read the context.  Since feature freeze for 3.0 is imminent,
this is the smallest workable patch, which replaces the qemu
block status report with the results of the NBD server's dirty
bitmap (making it very easy to use 'qemu-img map --output=json'
to learn where the dirty portions are).  Note that the NBD
protocol defines a dirty section with the same bit but opposite
sense that normal "base:allocation" uses to report an allocated
section; so in qemu-img map output, "data":true corresponds to
clean, "data":false corresponds to dirty.

A more complete solution that allows dirty bitmaps to be queried
at the same time as normal block status will be required before
this addition can lose the x- prefix.  Until then, the fact that
this replaces normal status with dirty status means actions
like 'qemu-img convert' will likely misbehave due to treating
dirty regions of the file as if they are unallocated.

The next patch adds an iotest to exercise this new code.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180702191458.28741-2-eblake@redhat.com>

216ee365

May 04, 2018

nbd/client: Fix error messages during NBD_INFO_BLOCK_SIZE · e475d108

Eric Blake authored 6 years ago


A missing space makes for poor error messages, and sizes can't
go negative.  Also, we missed diagnosing a server that sends
a maximum block size less than the minimum.

Fixes: 081dd1fe
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180501154654.943782-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

e475d108

nbd/client: fix nbd_negotiate_simple_meta_context · 89aa0d87

Vladimir Sementsov-Ogievskiy authored 6 years ago


Initialize received variable. Otherwise, is is possible for server to
answer without any contexts, but we will set context_id to something
random (received_id is not initialized too) and return 1, which is
wrong.

To solve it, just initialize received to false. Initialize received_id
too, just to make all possible checkers happy.

Bug was introduced in 78a33ab5 "nbd: BLOCK_STATUS for
standard get_block_status function: client part" with the whole
function.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180427142002.21930-2-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>

89aa0d87

Apr 02, 2018

nbd: trace meta context negotiation · 2b53af25

Eric Blake authored 7 years ago


Having a more detailed log of the interaction between client and
server is invaluable in debugging how meta context negotiation
actually works.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180330130950.1931229-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

2b53af25

nbd/client: Correctly handle bad server REP_META_CONTEXT · 260e34db

Eric Blake authored 7 years ago


It's never a good idea to blindly read for size bytes as
returned by the server without first validating that the size
is within bounds; a malicious or buggy server could cause us
to hang or get out of sync from reading further messages.

It may be smarter to try and teach the client to cope with
unexpected context ids by silently ignoring them instead of
hanging up on the server, but for now, if the server doesn't
reply with exactly the one context we expect, it's easier to
just give up - however, if we give up for any reason other
than an I/O failure, we might as well try to politely tell
the server we are quitting rather than continuing.

Fix some typos in the process.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180329231837.1914680-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

260e34db

Mar 13, 2018

nbd: BLOCK_STATUS for standard get_block_status function: client part · 78a33ab5

Vladimir Sementsov-Ogievskiy authored 7 years ago


Minimal realization: only one extent in server answer is supported.
Flag NBD_CMD_FLAG_REQ_ONE is used to force this behavior.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180312152126.286890-6-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar tweaks, fix min_block check and 32-bit cap, use -1
instead of errno on failure in nbd_negotiate_simple_meta_context,
ensure that block status makes progress on success]
Signed-off-by: Eric Blake <eblake@redhat.com>

78a33ab5

Mar 06, 2018

qio: non-default context for TLS handshake · 1939ccda

Peter Xu authored 7 years ago


A new parameter "context" is added to qio_channel_tls_handshake() is to
allow the TLS to be run on a non-default context.  Still, no functional
change.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

1939ccda

Mar 01, 2018

nbd/client: fix error messages in nbd_handle_reply_err · 28fb494f

Vladimir Sementsov-Ogievskiy authored 7 years ago


1. NBD_REP_ERR_INVALID is not only about length, so, make message more
   general

2. hex format is not very good: it's hard to read something like
   "option a (set meta context)", so switch to dec.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <1518702707-7077-6-git-send-email-vsementsov@virtuozzo.com>
[eblake: expand scope of patch: ALL uses of nbd_opt_lookup and
nbd_rep_lookup are now decimal]
Signed-off-by: Eric Blake <eblake@redhat.com>

28fb494f

Jan 10, 2018

nbd: rename nbd_option and nbd_opt_reply · 420a4e95

Vladimir Sementsov-Ogievskiy authored 7 years ago


Rename nbd_option and nbd_opt_reply to NBDOption and NBDOptionReply
to correspond to Qemu coding style and other structures here.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171122101958.17065-5-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

420a4e95

Nov 17, 2017

nbd/client: Don't hard-disconnect on ESHUTDOWN from server · 01b05c66

Eric Blake authored 7 years ago


The NBD spec says that a server may fail any transmission request
with ESHUTDOWN when it is apparent that no further request from
the client can be successfully honored.  The client is supposed
to then initiate a soft shutdown (wait for all remaining in-flight
requests to be answered, then send NBD_CMD_DISC).  However, since
qemu's server never uses ESHUTDOWN errors, this code was mostly
untested since its introduction in commit b6f5d3b5.

More recently, I learned that nbdkit as the NBD server is able to
send ESHUTDOWN errors, so I finally tested this code, and noticed
that our client was special-casing ESHUTDOWN to cause a hard
shutdown (immediate disconnect, with no NBD_CMD_DISC), but only
if the server sends this error as a simple reply.  Further
investigation found that commit d2febedb introduced a regression
where structured replies behave differently than simple replies -
but that the structured reply behavior is more in line with the
spec (even if we still lack code in nbd-client.c to properly quit
sending further requests).  So this patch reverts the portion of
b6f5d3b5 that introduced an improper hard-disconnect special-case
at the lower level, and leaves the future enhancement of a nicer
soft-disconnect at the higher level for another day.

CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171113194857.13933-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

01b05c66