Skip to content
Snippets Groups Projects
  • Vladimir Sementsov-Ogievskiy's avatar
    4ddb5d2f
    block/nbd: drop connection_co · 4ddb5d2f
    Vladimir Sementsov-Ogievskiy authored
    
    OK, that's a big rewrite of the logic.
    
    Pre-patch we have an always running coroutine - connection_co. It does
    reply receiving and reconnecting. And it leads to a lot of difficult
    and unobvious code around drained sections and context switch. We also
    abuse bs->in_flight counter which is increased for connection_co and
    temporary decreased in points where we want to allow drained section to
    begin. One of these place is in another file: in nbd_read_eof() in
    nbd/client.c.
    
    We also cancel reconnect and requests waiting for reconnect on drained
    begin which is not correct. And this patch fixes that.
    
    Let's finally drop this always running coroutine and go another way:
    do both reconnect and receiving in request coroutines.
    
    The detailed list of changes below (in the sequence of diff hunks).
    
    1. receiving coroutines are woken directly from nbd_channel_error, when
       we change s->state
    
    2. nbd_co_establish_connection_cancel(): we don't have drain_begin now,
       and in nbd_teardown_connection() all requests should already be
       finished (and reconnect is done from request). So
       nbd_co_establish_connection_cancel() is called from
       nbd_cancel_in_flight() (to cancel the request that is doing
       nbd_co_establish_connection()) and from reconnect_delay_timer_cb()
       (previously we didn't need it, as reconnect delay only should cancel
       active requests not the reconnection itself). But now reconnection
       itself is done in the separate thread (we now call
       nbd_client_connection_enable_retry() in nbd_open()), and we need to
       cancel the requests that wait in nbd_co_establish_connection()
       now).
    
    2A. We do receive headers in request coroutine. But we also should
       dispatch replies for other pending requests. So,
       nbd_connection_entry() is turned into nbd_receive_replies(), which
       does reply dispatching while it receives other request headers, and
       returns when it receives the requested header.
    
    3. All old staff around drained sections and context switch is dropped.
       In details:
       - we don't need to move connection_co to new aio context, as we
         don't have connection_co anymore
       - we don't have a fake "request" of connection_co (extra increasing
         in_flight), so don't care with it in drain_begin/end
       - we don't stop reconnection during drained section anymore. This
         means that drain_begin may wait for a long time (up to
         reconnect_delay). But that's an improvement and more correct
         behavior see below[*]
    
    4. In nbd_teardown_connection() we don't have to wait for
       connection_co, as it is dropped. And cleanup for s->ioc and nbd_yank
       is moved here from removed connection_co.
    
    5. In nbd_co_do_establish_connection() we now should handle
       NBD_CLIENT_CONNECTING_NOWAIT: if new request comes when we are in
       NBD_CLIENT_CONNECTING_NOWAIT, it still should call
       nbd_co_establish_connection() (who knows, maybe the connection was
       already established by another thread in the background). But we
       shouldn't wait: if nbd_co_establish_connection() can't return new
       channel immediately the request should fail (we are in
       NBD_CLIENT_CONNECTING_NOWAIT state).
    
    6. nbd_reconnect_attempt() is simplified: it's now easier to wait for
       other requests in the caller, so here we just assert that fact.
       Also delay time is now initialized here: we can easily detect first
       attempt and start a timer.
    
    7. nbd_co_reconnect_loop() is dropped, we don't need it. Reconnect
       retries are fully handle by thread (nbd/client-connection.c), delay
       timer we initialize in nbd_reconnect_attempt(), we don't have to
       bother with s->drained and friends. nbd_reconnect_attempt() now
       called from nbd_co_send_request().
    
    8. nbd_connection_entry is dropped: reconnect is now handled by
       nbd_co_send_request(), receiving reply is now handled by
       nbd_receive_replies(): all handled from request coroutines.
    
    9. So, welcome new nbd_receive_replies() called from request coroutine,
       that receives reply header instead of nbd_connection_entry().
       Like with sending requests, only one coroutine may receive in a
       moment. So we introduce receive_mutex, which is locked around
       nbd_receive_reply(). It also protects some related fields. Still,
       full audit of thread-safety in nbd driver is a separate task.
       New function waits for a reply with specified handle being received
       and works rather simple:
    
       Under mutex:
         - if current handle is 0, do receive by hand. If another handle
           received - switch to other request coroutine, release mutex and
           yield. Otherwise return success
         - if current handle == requested handle, we are done
         - otherwise, release mutex and yield
    
    10: in nbd_co_send_request() we now do nbd_reconnect_attempt() if
        needed. Also waiting in free_sema queue we now wait for one of two
        conditions:
        - connectED, in_flight < MAX_NBD_REQUESTS (so we can start new one)
        - connectING, in_flight == 0, so we can call
          nbd_reconnect_attempt()
        And this logic is protected by s->send_mutex
    
        Also, on failure we don't have to care of removed s->connection_co
    
    11. nbd_co_do_receive_one_chunk(): now instead of yield() and wait for
        s->connection_co we just call new nbd_receive_replies().
    
    12. nbd_co_receive_one_chunk(): place where s->reply.handle becomes 0,
        which means that handling of the whole reply is finished. Here we
        need to wake one of coroutines sleeping in nbd_receive_replies().
        If none are sleeping - do nothing. That's another behavior change: we
        don't have endless recv() in the idle time. It may be considered as
        a drawback. If so, it may be fixed later.
    
    13. nbd_reply_chunk_iter_receive(): don't care about removed
        connection_co, just ping in_flight waiters.
    
    14. Don't create connection_co, enable retry in the connection thread
        (we don't have own reconnect loop anymore)
    
    15. We now need to add a nbd_co_establish_connection_cancel() call in
        nbd_cancel_in_flight(), to cancel the request that is doing a
        connection attempt.
    
    [*], ok, now we don't cancel reconnect on drain begin. That's correct:
        reconnect feature leads to possibility of long-running requests (up
        to reconnect delay). Still, drain begin is not a reason to kill
        long requests. We should wait for them.
    
        This also means, that we can again reproduce a dead-lock, described
        in 8c517de2.
        Why we are OK with it:
        1. Now this is not absolutely-dead dead-lock: the vm is unfrozen
           after reconnect delay. Actually 8c517de2 fixed a bug in
           NBD logic, that was not described in 8c517de2 and led to
           forever dead-lock. The problem was that nobody woke the free_sema
           queue, but drain_begin can't finish until there is a request in
           free_sema queue. Now we have a reconnect delay timer that works
           well.
        2. It's not a problem of the NBD driver, but of the ide code,
           because it does drain_begin under the global mutex; the problem
           doesn't reproduce when using scsi instead of ide.
    
    Signed-off-by: default avatarVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
    Message-Id: <20210902103805.25686-5-vsementsov@virtuozzo.com>
    Reviewed-by: default avatarEric Blake <eblake@redhat.com>
    [eblake: grammar and comment tweaks]
    Signed-off-by: default avatarEric Blake <eblake@redhat.com>
    4ddb5d2f
    History
    block/nbd: drop connection_co
    Vladimir Sementsov-Ogievskiy authored
    
    OK, that's a big rewrite of the logic.
    
    Pre-patch we have an always running coroutine - connection_co. It does
    reply receiving and reconnecting. And it leads to a lot of difficult
    and unobvious code around drained sections and context switch. We also
    abuse bs->in_flight counter which is increased for connection_co and
    temporary decreased in points where we want to allow drained section to
    begin. One of these place is in another file: in nbd_read_eof() in
    nbd/client.c.
    
    We also cancel reconnect and requests waiting for reconnect on drained
    begin which is not correct. And this patch fixes that.
    
    Let's finally drop this always running coroutine and go another way:
    do both reconnect and receiving in request coroutines.
    
    The detailed list of changes below (in the sequence of diff hunks).
    
    1. receiving coroutines are woken directly from nbd_channel_error, when
       we change s->state
    
    2. nbd_co_establish_connection_cancel(): we don't have drain_begin now,
       and in nbd_teardown_connection() all requests should already be
       finished (and reconnect is done from request). So
       nbd_co_establish_connection_cancel() is called from
       nbd_cancel_in_flight() (to cancel the request that is doing
       nbd_co_establish_connection()) and from reconnect_delay_timer_cb()
       (previously we didn't need it, as reconnect delay only should cancel
       active requests not the reconnection itself). But now reconnection
       itself is done in the separate thread (we now call
       nbd_client_connection_enable_retry() in nbd_open()), and we need to
       cancel the requests that wait in nbd_co_establish_connection()
       now).
    
    2A. We do receive headers in request coroutine. But we also should
       dispatch replies for other pending requests. So,
       nbd_connection_entry() is turned into nbd_receive_replies(), which
       does reply dispatching while it receives other request headers, and
       returns when it receives the requested header.
    
    3. All old staff around drained sections and context switch is dropped.
       In details:
       - we don't need to move connection_co to new aio context, as we
         don't have connection_co anymore
       - we don't have a fake "request" of connection_co (extra increasing
         in_flight), so don't care with it in drain_begin/end
       - we don't stop reconnection during drained section anymore. This
         means that drain_begin may wait for a long time (up to
         reconnect_delay). But that's an improvement and more correct
         behavior see below[*]
    
    4. In nbd_teardown_connection() we don't have to wait for
       connection_co, as it is dropped. And cleanup for s->ioc and nbd_yank
       is moved here from removed connection_co.
    
    5. In nbd_co_do_establish_connection() we now should handle
       NBD_CLIENT_CONNECTING_NOWAIT: if new request comes when we are in
       NBD_CLIENT_CONNECTING_NOWAIT, it still should call
       nbd_co_establish_connection() (who knows, maybe the connection was
       already established by another thread in the background). But we
       shouldn't wait: if nbd_co_establish_connection() can't return new
       channel immediately the request should fail (we are in
       NBD_CLIENT_CONNECTING_NOWAIT state).
    
    6. nbd_reconnect_attempt() is simplified: it's now easier to wait for
       other requests in the caller, so here we just assert that fact.
       Also delay time is now initialized here: we can easily detect first
       attempt and start a timer.
    
    7. nbd_co_reconnect_loop() is dropped, we don't need it. Reconnect
       retries are fully handle by thread (nbd/client-connection.c), delay
       timer we initialize in nbd_reconnect_attempt(), we don't have to
       bother with s->drained and friends. nbd_reconnect_attempt() now
       called from nbd_co_send_request().
    
    8. nbd_connection_entry is dropped: reconnect is now handled by
       nbd_co_send_request(), receiving reply is now handled by
       nbd_receive_replies(): all handled from request coroutines.
    
    9. So, welcome new nbd_receive_replies() called from request coroutine,
       that receives reply header instead of nbd_connection_entry().
       Like with sending requests, only one coroutine may receive in a
       moment. So we introduce receive_mutex, which is locked around
       nbd_receive_reply(). It also protects some related fields. Still,
       full audit of thread-safety in nbd driver is a separate task.
       New function waits for a reply with specified handle being received
       and works rather simple:
    
       Under mutex:
         - if current handle is 0, do receive by hand. If another handle
           received - switch to other request coroutine, release mutex and
           yield. Otherwise return success
         - if current handle == requested handle, we are done
         - otherwise, release mutex and yield
    
    10: in nbd_co_send_request() we now do nbd_reconnect_attempt() if
        needed. Also waiting in free_sema queue we now wait for one of two
        conditions:
        - connectED, in_flight < MAX_NBD_REQUESTS (so we can start new one)
        - connectING, in_flight == 0, so we can call
          nbd_reconnect_attempt()
        And this logic is protected by s->send_mutex
    
        Also, on failure we don't have to care of removed s->connection_co
    
    11. nbd_co_do_receive_one_chunk(): now instead of yield() and wait for
        s->connection_co we just call new nbd_receive_replies().
    
    12. nbd_co_receive_one_chunk(): place where s->reply.handle becomes 0,
        which means that handling of the whole reply is finished. Here we
        need to wake one of coroutines sleeping in nbd_receive_replies().
        If none are sleeping - do nothing. That's another behavior change: we
        don't have endless recv() in the idle time. It may be considered as
        a drawback. If so, it may be fixed later.
    
    13. nbd_reply_chunk_iter_receive(): don't care about removed
        connection_co, just ping in_flight waiters.
    
    14. Don't create connection_co, enable retry in the connection thread
        (we don't have own reconnect loop anymore)
    
    15. We now need to add a nbd_co_establish_connection_cancel() call in
        nbd_cancel_in_flight(), to cancel the request that is doing a
        connection attempt.
    
    [*], ok, now we don't cancel reconnect on drain begin. That's correct:
        reconnect feature leads to possibility of long-running requests (up
        to reconnect delay). Still, drain begin is not a reason to kill
        long requests. We should wait for them.
    
        This also means, that we can again reproduce a dead-lock, described
        in 8c517de2.
        Why we are OK with it:
        1. Now this is not absolutely-dead dead-lock: the vm is unfrozen
           after reconnect delay. Actually 8c517de2 fixed a bug in
           NBD logic, that was not described in 8c517de2 and led to
           forever dead-lock. The problem was that nobody woke the free_sema
           queue, but drain_begin can't finish until there is a request in
           free_sema queue. Now we have a reconnect delay timer that works
           well.
        2. It's not a problem of the NBD driver, but of the ide code,
           because it does drain_begin under the global mutex; the problem
           doesn't reproduce when using scsi instead of ide.
    
    Signed-off-by: default avatarVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
    Message-Id: <20210902103805.25686-5-vsementsov@virtuozzo.com>
    Reviewed-by: default avatarEric Blake <eblake@redhat.com>
    [eblake: grammar and comment tweaks]
    Signed-off-by: default avatarEric Blake <eblake@redhat.com>