Skip to content
  • Vladimir Sementsov-Ogievskiy's avatar
    46f56631
    block/nbd: fix reconnect-delay · 46f56631
    Vladimir Sementsov-Ogievskiy authored
    
    
    reconnect-delay has a design flaw: we handle it in the same loop where
    we do connection attempt. So, reconnect-delay may be exceeded by
    unpredictable time of connection attempt.
    
    Let's instead use separate timer.
    
    How to reproduce the bug:
    
    1. Create an image on node1:
       qemu-img create -f qcow2 xx 100M
    
    2. Start NBD server on node1:
       qemu-nbd xx
    
    3. On node2 start qemu-io:
    
    ./build/qemu-io --image-opts \
    driver=nbd,server.type=inet,server.host=192.168.100.5,server.port=10809,reconnect-delay=15
    
    4. Type 'read 0 512' in qemu-io interface to check that connection
       works
    
    Be careful: you should make steps 5-7 in a short time, less than 15
    seconds.
    
    5. Kill nbd server on node1
    
    6. Run 'read 0 512' in qemu-io interface again, to be sure that nbd
    client goes to reconnect loop.
    
    7. On node1 run the following command
    
       sudo iptables -A INPUT -p tcp --dport 10809 -j DROP
    
    This will make the connect() call of qemu-io at node2 take a long time.
    
    And you'll see that read command in qemu-io will hang for a long time,
    more than 15 seconds specified by reconnect-delay parameter. It's the
    bug.
    
    8. Don't forget to drop iptables rule on node1:
    
       sudo iptables -D INPUT -p tcp --dport 10809 -j DROP
    
    Important note: Step [5] is necessary to reproduce _this_ bug. If we
    miss step [5], the read command (step 6) will hang for a long time and
    this commit doesn't help, because there will be not long connect() to
    unreachable host, but long sendmsg() to unreachable host, which should
    be fixed by enabling and adjusting keep-alive on the socket, which is a
    thing for further patch set.
    
    Signed-off-by: default avatarVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
    Message-Id: <20200903190301.367620-4-vsementsov@virtuozzo.com>
    Reviewed-by: default avatarEric Blake <eblake@redhat.com>
    Signed-off-by: default avatarEric Blake <eblake@redhat.com>
    46f56631
    block/nbd: fix reconnect-delay
    Vladimir Sementsov-Ogievskiy authored
    
    
    reconnect-delay has a design flaw: we handle it in the same loop where
    we do connection attempt. So, reconnect-delay may be exceeded by
    unpredictable time of connection attempt.
    
    Let's instead use separate timer.
    
    How to reproduce the bug:
    
    1. Create an image on node1:
       qemu-img create -f qcow2 xx 100M
    
    2. Start NBD server on node1:
       qemu-nbd xx
    
    3. On node2 start qemu-io:
    
    ./build/qemu-io --image-opts \
    driver=nbd,server.type=inet,server.host=192.168.100.5,server.port=10809,reconnect-delay=15
    
    4. Type 'read 0 512' in qemu-io interface to check that connection
       works
    
    Be careful: you should make steps 5-7 in a short time, less than 15
    seconds.
    
    5. Kill nbd server on node1
    
    6. Run 'read 0 512' in qemu-io interface again, to be sure that nbd
    client goes to reconnect loop.
    
    7. On node1 run the following command
    
       sudo iptables -A INPUT -p tcp --dport 10809 -j DROP
    
    This will make the connect() call of qemu-io at node2 take a long time.
    
    And you'll see that read command in qemu-io will hang for a long time,
    more than 15 seconds specified by reconnect-delay parameter. It's the
    bug.
    
    8. Don't forget to drop iptables rule on node1:
    
       sudo iptables -D INPUT -p tcp --dport 10809 -j DROP
    
    Important note: Step [5] is necessary to reproduce _this_ bug. If we
    miss step [5], the read command (step 6) will hang for a long time and
    this commit doesn't help, because there will be not long connect() to
    unreachable host, but long sendmsg() to unreachable host, which should
    be fixed by enabling and adjusting keep-alive on the socket, which is a
    thing for further patch set.
    
    Signed-off-by: default avatarVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
    Message-Id: <20200903190301.367620-4-vsementsov@virtuozzo.com>
    Reviewed-by: default avatarEric Blake <eblake@redhat.com>
    Signed-off-by: default avatarEric Blake <eblake@redhat.com>
Loading